google · Google AI Docs
Gemini API optimization and inference
Provides strategies and technical methods for optimizing Gemini API performance, including batch processing, context caching, and managing inference priority.
Derived skill
Files assembled from official documentation
Viewing SKILL.md
Gemini API optimization and inference
Provides strategies and technical methods for optimizing Gemini API performance, including batch processing, context caching, and managing inference priority.
When To Use
Use when you need to reduce latency through context caching, lower costs via the Batch API, or manage request priority during inference.
Reference Files
| File | Contains | Use For |
|---|---|---|
SKILL.md | Entry point: scope, routing table, and workflow. | Start here. |
docs/gemini-api-optimization-and-inference-workflow-guide.md | A guide explaining the different Gemini API inference service tiers including standard, priority, and flex options for balancing speed, cost, and reliability. | Questions about a guide explaining the different Gemini API inference service tiers including standard, priority, and flex options fo... |
What This Skill Covers
- The Gemini API offers a variety of optimization mechanisms to help you balance speed, cost, and reliability based on your specific workload needs. Whether yo...
- Main sections:
Inference service tiers (Synchronous),Standard inference (Default),Priority inference (Latency-optimized),Flex inference (Cost-optimized),Batch API (Bulk, asynchronous).
Workflow
- Open the most relevant file under
docs/for the exact documented workflow and wording. - Open
schemas/files for exact structured contracts. - Open
examples/files for concrete requests, commands, snippets, and manifests. - Do not add behavior or configuration that is not present in the attached source files.
Canonical source: https://ai.google.dev/gemini-api/docs/optimization
