google · Google AI Docs
Gemini API Flex inference
Explains how to use flex inference to dynamically adjust model parameters or configurations during inference tasks.
Derived skill
Files assembled from official documentation
Viewing SKILL.md
Gemini API Flex inference
Explains how to use flex inference to dynamically adjust model parameters or configurations during inference tasks.
When To Use
Use when you need to implement dynamic inference configurations to balance performance and cost during model execution.
Reference Files
| File | Contains | Use For |
|---|---|---|
SKILL.md | Entry point: scope, routing table, and workflow. | Start here. |
docs/gemini-api-flex-inference-workflow-guide.md | A guide explaining how to use the Gemini API Flex inference tier for cost-optimized, latency-tolerant workloads. | Questions about a guide explaining how to use the Gemini API Flex inference tier for cost-optimized, latency-tolerant workloads. |
examples/gemini-api-flex-inference-python-generatecontent.text | A Python code example demonstrating how to use the genai client to perform flex inference with the generatecontent method. | Exact payloads, commands, or snippets shown in A Python code example demonstrating how to use the genai client to perform flex inference with the generatecontent me... |
examples/gemini-api-flex-inference-nodejs-generate.text | A Node.js code example demonstrating how to use the GoogleGenAI SDK to perform a generateContent request with the flex service tier. | Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to use the GoogleGenAI SDK to perform a generateContent request with the fle... |
examples/gemini-api-flex-inference-golang-generatecontent.text | A Go implementation demonstrating how to use the Gemini API with the flex service tier for content generation. | Exact payloads, commands, or snippets shown in A Go implementation demonstrating how to use the Gemini API with the flex service tier for content generation. |
examples/gemini-api-flex-inference-curl-request.text | A curl command demonstrating a POST request to the Gemini API using the flex service tier. | Exact payloads, commands, or snippets shown in A curl command demonstrating a POST request to the Gemini API using the flex service tier. |
examples/gemini-api-flex-inference-python-client.text | A Python code example demonstrating how to use the Gemini API with the flex service tier and custom HTTP timeout options. | Exact payloads, commands, or snippets shown in A Python code example demonstrating how to use the Gemini API with the flex service tier and custom HTTP timeout opti... |
examples/gemini-api-flex-inference-nodejs-generatecontent.text | A Node.js code example demonstrating how to use the serviceTier flex configuration within a generateContent request using the GoogleGenAI SDK. | Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to use the serviceTier flex configuration within a generateContent request u... |
examples/gemini-api-flex-inference-golang.text | A Go implementation demonstrating how to use the Gemini API for flex inference using the genai package. | Exact payloads, commands, or snippets shown in A Go implementation demonstrating how to use the Gemini API for flex inference using the genai package. |
examples/gemini-api-flex-inference-curl-timeout.text | A curl command demonstrating how to set server-side and client-side timeout hints for Gemini API flex inference requests. | Exact payloads, commands, or snippets shown in A curl command demonstrating how to set server-side and client-side timeout hints for Gemini API flex inference reque... |
examples/gemini-api-flex-inference-python-generatecontent-global-timeout.text | A Python code example demonstrating how to use the Google GenAI SDK to call generatecontent with a custom global HTTP timeout. | Exact payloads, commands, or snippets shown in A Python code example demonstrating how to use the Google GenAI SDK to call generatecontent with a custom global HTTP... |
examples/gemini-api-flex-inference-nodejs-generatecontent-timeout.text | A Node.js code example demonstrating how to use the GoogleGenAI client to perform flex inference with a custom global HTTP timeout. | Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to use the GoogleGenAI client to perform flex inference with a custom global... |
examples/gemini-api-flex-inference-golang-2.text | A Go implementation demonstrating how to perform flex inference using the Google GenAI SDK. | Exact payloads, commands, or snippets shown in A Go implementation demonstrating how to perform flex inference using the Google GenAI SDK. |
examples/gemini-api-flex-inference-python-retry-logic.text | A Python code example demonstrating how to implement retry logic when using the Gemini API Flex service tier. | Exact payloads, commands, or snippets shown in A Python code example demonstrating how to implement retry logic when using the Gemini API Flex service tier. |
examples/gemini-api-flex-inference-nodejs-retry-logic.text | A Node.js code example demonstrating how to implement a retry mechanism when calling the Gemini API Flex tier. | Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to implement a retry mechanism when calling the Gemini API Flex tier. |
examples/gemini-api-flex-inference-golang-retry-logic.text | A Go implementation demonstrating how to use the Gemini API with flex inference and a custom retry mechanism. | Exact payloads, commands, or snippets shown in A Go implementation demonstrating how to use the Gemini API with flex inference and a custom retry mechanism. |
What This Skill Covers
- Preview: The Gemini Flex API is in Preview.
- Main sections:
How to use Flex,Python,JavaScript,Go,REST.
Workflow
- Open the most relevant file under
docs/for the exact documented workflow and wording. - Open
schemas/files for exact structured contracts. - Open
examples/files for concrete requests, commands, snippets, and manifests. - Do not add behavior or configuration that is not present in the attached source files.
Canonical source: https://ai.google.dev/gemini-api/docs/flex-inference
