Prompt Buddy logoPrompt Buddy

google · Google AI Docs

Gemini API Flex inference

Explains how to use flex inference to dynamically adjust model parameters or configurations during inference tasks.

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Gemini API Flex inference

Explains how to use flex inference to dynamically adjust model parameters or configurations during inference tasks.

When To Use

Use when you need to implement dynamic inference configurations to balance performance and cost during model execution.

Reference Files

FileContainsUse For
SKILL.mdEntry point: scope, routing table, and workflow.Start here.
docs/gemini-api-flex-inference-workflow-guide.mdA guide explaining how to use the Gemini API Flex inference tier for cost-optimized, latency-tolerant workloads.Questions about a guide explaining how to use the Gemini API Flex inference tier for cost-optimized, latency-tolerant workloads.
examples/gemini-api-flex-inference-python-generatecontent.textA Python code example demonstrating how to use the genai client to perform flex inference with the generatecontent method.Exact payloads, commands, or snippets shown in A Python code example demonstrating how to use the genai client to perform flex inference with the generatecontent me...
examples/gemini-api-flex-inference-nodejs-generate.textA Node.js code example demonstrating how to use the GoogleGenAI SDK to perform a generateContent request with the flex service tier.Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to use the GoogleGenAI SDK to perform a generateContent request with the fle...
examples/gemini-api-flex-inference-golang-generatecontent.textA Go implementation demonstrating how to use the Gemini API with the flex service tier for content generation.Exact payloads, commands, or snippets shown in A Go implementation demonstrating how to use the Gemini API with the flex service tier for content generation.
examples/gemini-api-flex-inference-curl-request.textA curl command demonstrating a POST request to the Gemini API using the flex service tier.Exact payloads, commands, or snippets shown in A curl command demonstrating a POST request to the Gemini API using the flex service tier.
examples/gemini-api-flex-inference-python-client.textA Python code example demonstrating how to use the Gemini API with the flex service tier and custom HTTP timeout options.Exact payloads, commands, or snippets shown in A Python code example demonstrating how to use the Gemini API with the flex service tier and custom HTTP timeout opti...
examples/gemini-api-flex-inference-nodejs-generatecontent.textA Node.js code example demonstrating how to use the serviceTier flex configuration within a generateContent request using the GoogleGenAI SDK.Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to use the serviceTier flex configuration within a generateContent request u...
examples/gemini-api-flex-inference-golang.textA Go implementation demonstrating how to use the Gemini API for flex inference using the genai package.Exact payloads, commands, or snippets shown in A Go implementation demonstrating how to use the Gemini API for flex inference using the genai package.
examples/gemini-api-flex-inference-curl-timeout.textA curl command demonstrating how to set server-side and client-side timeout hints for Gemini API flex inference requests.Exact payloads, commands, or snippets shown in A curl command demonstrating how to set server-side and client-side timeout hints for Gemini API flex inference reque...
examples/gemini-api-flex-inference-python-generatecontent-global-timeout.textA Python code example demonstrating how to use the Google GenAI SDK to call generatecontent with a custom global HTTP timeout.Exact payloads, commands, or snippets shown in A Python code example demonstrating how to use the Google GenAI SDK to call generatecontent with a custom global HTTP...
examples/gemini-api-flex-inference-nodejs-generatecontent-timeout.textA Node.js code example demonstrating how to use the GoogleGenAI client to perform flex inference with a custom global HTTP timeout.Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to use the GoogleGenAI client to perform flex inference with a custom global...
examples/gemini-api-flex-inference-golang-2.textA Go implementation demonstrating how to perform flex inference using the Google GenAI SDK.Exact payloads, commands, or snippets shown in A Go implementation demonstrating how to perform flex inference using the Google GenAI SDK.
examples/gemini-api-flex-inference-python-retry-logic.textA Python code example demonstrating how to implement retry logic when using the Gemini API Flex service tier.Exact payloads, commands, or snippets shown in A Python code example demonstrating how to implement retry logic when using the Gemini API Flex service tier.
examples/gemini-api-flex-inference-nodejs-retry-logic.textA Node.js code example demonstrating how to implement a retry mechanism when calling the Gemini API Flex tier.Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to implement a retry mechanism when calling the Gemini API Flex tier.
examples/gemini-api-flex-inference-golang-retry-logic.textA Go implementation demonstrating how to use the Gemini API with flex inference and a custom retry mechanism.Exact payloads, commands, or snippets shown in A Go implementation demonstrating how to use the Gemini API with flex inference and a custom retry mechanism.

What This Skill Covers

  • Preview: The Gemini Flex API is in Preview.
  • Main sections: How to use Flex, Python, JavaScript, Go, REST.

Workflow

  1. Open the most relevant file under docs/ for the exact documented workflow and wording.
  2. Open schemas/ files for exact structured contracts.
  3. Open examples/ files for concrete requests, commands, snippets, and manifests.
  4. Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://ai.google.dev/gemini-api/docs/flex-inference