google · Google AI Docs

Gemini API optimization and inference

Provides strategies and technical methods for optimizing Gemini API performance, including batch processing, context caching, and managing inference priority.

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Gemini API optimization and inference

Provides strategies and technical methods for optimizing Gemini API performance, including batch processing, context caching, and managing inference priority.

When To Use

Use when you need to reduce latency through context caching, lower costs via the Batch API, or manage request priority during inference.

Reference Files

File	Contains	Use For
`SKILL.md`	Entry point: scope, routing table, and workflow.	Start here.
`docs/gemini-api-optimization-and-inference-workflow-guide.md`	A guide explaining the different Gemini API inference service tiers including standard, priority, and flex options for balancing speed, cost, and reliability.	Questions about a guide explaining the different Gemini API inference service tiers including standard, priority, and flex options fo...

What This Skill Covers

The Gemini API offers a variety of optimization mechanisms to help you balance speed, cost, and reliability based on your specific workload needs. Whether yo...
Main sections: Inference service tiers (Synchronous), Standard inference (Default), Priority inference (Latency-optimized), Flex inference (Cost-optimized), Batch API (Bulk, asynchronous).

Workflow

Open the most relevant file under docs/ for the exact documented workflow and wording.
Open schemas/ files for exact structured contracts.
Open examples/ files for concrete requests, commands, snippets, and manifests.
Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://ai.google.dev/gemini-api/docs/optimization