google · Google AI Docs

Gemini API Flex inference

Explains how to use flex inference to dynamically adjust model parameters or configurations during inference tasks.

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Gemini API Flex inference

Explains how to use flex inference to dynamically adjust model parameters or configurations during inference tasks.

When To Use

Use when you need to implement dynamic inference configurations to balance performance and cost during model execution.

Reference Files

What This Skill Covers

Preview: The Gemini Flex API is in Preview.
Main sections: How to use Flex, Python, JavaScript, Go, REST.

Workflow

Open the most relevant file under docs/ for the exact documented workflow and wording.
Open schemas/ files for exact structured contracts.
Open examples/ files for concrete requests, commands, snippets, and manifests.
Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://ai.google.dev/gemini-api/docs/flex-inference

Skill metadata

Name: Gemini API Flex inference
Author: Bruno HANSS - Prompt Buddy
Generation mode: Ai Assisted Human Authored
Source count: 1

Provenance

Source program: Google AI Docs
Last generated: May 10, 2026
Last source sync: Unknown
Source pages: 1

Safety model

Canonical source pages are preserved separately. Derived files record source evidence and require zero AI-generated facts.

File tree

Source links

https://ai.google.dev/gemini-api/docs/flex-inference Back to skills

File	Contains	Use For
`SKILL.md`	Entry point: scope, routing table, and workflow.	Start here.
`docs/gemini-api-flex-inference-workflow-guide.md`	A guide explaining how to use the Gemini API Flex inference tier for cost-optimized, latency-tolerant workloads.	Questions about a guide explaining how to use the Gemini API Flex inference tier for cost-optimized, latency-tolerant workloads.
`examples/gemini-api-flex-inference-python-generatecontent.text`	A Python code example demonstrating how to use the genai client to perform flex inference with the generatecontent method.	Exact payloads, commands, or snippets shown in A Python code example demonstrating how to use the genai client to perform flex inference with the generatecontent me...
`examples/gemini-api-flex-inference-nodejs-generate.text`	A Node.js code example demonstrating how to use the GoogleGenAI SDK to perform a generateContent request with the flex service tier.	Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to use the GoogleGenAI SDK to perform a generateContent request with the fle...
`examples/gemini-api-flex-inference-golang-generatecontent.text`	A Go implementation demonstrating how to use the Gemini API with the flex service tier for content generation.	Exact payloads, commands, or snippets shown in A Go implementation demonstrating how to use the Gemini API with the flex service tier for content generation.
`examples/gemini-api-flex-inference-curl-request.text`	A curl command demonstrating a POST request to the Gemini API using the flex service tier.	Exact payloads, commands, or snippets shown in A curl command demonstrating a POST request to the Gemini API using the flex service tier.
`examples/gemini-api-flex-inference-python-client.text`	A Python code example demonstrating how to use the Gemini API with the flex service tier and custom HTTP timeout options.	Exact payloads, commands, or snippets shown in A Python code example demonstrating how to use the Gemini API with the flex service tier and custom HTTP timeout opti...
`examples/gemini-api-flex-inference-nodejs-generatecontent.text`	A Node.js code example demonstrating how to use the serviceTier flex configuration within a generateContent request using the GoogleGenAI SDK.	Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to use the serviceTier flex configuration within a generateContent request u...
`examples/gemini-api-flex-inference-golang.text`	A Go implementation demonstrating how to use the Gemini API for flex inference using the genai package.	Exact payloads, commands, or snippets shown in A Go implementation demonstrating how to use the Gemini API for flex inference using the genai package.
`examples/gemini-api-flex-inference-curl-timeout.text`	A curl command demonstrating how to set server-side and client-side timeout hints for Gemini API flex inference requests.	Exact payloads, commands, or snippets shown in A curl command demonstrating how to set server-side and client-side timeout hints for Gemini API flex inference reque...
`examples/gemini-api-flex-inference-python-generatecontent-global-timeout.text`	A Python code example demonstrating how to use the Google GenAI SDK to call generatecontent with a custom global HTTP timeout.	Exact payloads, commands, or snippets shown in A Python code example demonstrating how to use the Google GenAI SDK to call generatecontent with a custom global HTTP...
`examples/gemini-api-flex-inference-nodejs-generatecontent-timeout.text`	A Node.js code example demonstrating how to use the GoogleGenAI client to perform flex inference with a custom global HTTP timeout.	Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to use the GoogleGenAI client to perform flex inference with a custom global...
`examples/gemini-api-flex-inference-golang-2.text`	A Go implementation demonstrating how to perform flex inference using the Google GenAI SDK.	Exact payloads, commands, or snippets shown in A Go implementation demonstrating how to perform flex inference using the Google GenAI SDK.
`examples/gemini-api-flex-inference-python-retry-logic.text`	A Python code example demonstrating how to implement retry logic when using the Gemini API Flex service tier.	Exact payloads, commands, or snippets shown in A Python code example demonstrating how to implement retry logic when using the Gemini API Flex service tier.
`examples/gemini-api-flex-inference-nodejs-retry-logic.text`	A Node.js code example demonstrating how to implement a retry mechanism when calling the Gemini API Flex tier.	Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to implement a retry mechanism when calling the Gemini API Flex tier.
`examples/gemini-api-flex-inference-golang-retry-logic.text`	A Go implementation demonstrating how to use the Gemini API with flex inference and a custom retry mechanism.	Exact payloads, commands, or snippets shown in A Go implementation demonstrating how to use the Gemini API with flex inference and a custom retry mechanism.