openai · OpenAI Platform Docs

Latency optimization | OpenAI API

Provides strategies and technical methods to reduce response times when using the OpenAI API, including techniques like prompt engineering, model selection, and streaming.

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Latency optimization | OpenAI API

Provides strategies and technical methods to reduce response times when using the OpenAI API, including techniques like prompt engineering, model selection, and streaming.

When To Use

Use when you need to reduce the time it takes for an LLM to return a response or improve the perceived speed of an application.

Reference Files

File	Contains	Use For
`SKILL.md`	Entry point: scope, routing table, and workflow.	Start here.
`docs/latency-optimization-openai-api-workflow-guide.md`	A guide detailing strategies for reducing latency through prompt analysis and structured output optimization.	Questions about a guide detailing strategies for reducing latency through prompt analysis and structured output optimization.
`examples/latency-optimization-openai-api-openai-api-latency-optimization-prompt-r.text`	A text example demonstrating how to rewrite user queries with conversation history to optimize latency through context management.	Exact payloads, commands, or snippets shown in A text example demonstrating how to rewrite user queries with conversation history to optimize latency through contex...
`examples/latency-optimization-openai-api-prompt.text`	A text-based prompt example demonstrating how to instruct a model to classify user queries for real-time lookup requirements to optimize latency.	Exact payloads, commands, or snippets shown in A text-based prompt example demonstrating how to instruct a model to classify user queries for real-time lookup requi...
`examples/latency-optimization-openai-api-system-prompt.text`	A text-based example of a system prompt designed to optimize latency by instructing the model to output concise JSON reasoning.	Exact payloads, commands, or snippets shown in A text-based example of a system prompt designed to optimize latency by instructing the model to output concise JSON...
`examples/latency-optimization-openai-api-guide-metrics.text`	A text-based example demonstrating latency optimization metrics including lead, excerpt, and contextualized query parameters.	Exact payloads, commands, or snippets shown in A text-based example demonstrating latency optimization metrics including lead, excerpt, and contextualized query par...
`examples/latency-optimization-openai-api-prompt-engineering.text`	A text-based example demonstrating how to optimize latency through prompt engineering and system message structuring.	Exact payloads, commands, or snippets shown in A text-based example demonstrating how to optimize latency through prompt engineering and system message structuring.
`examples/latency-optimization-openai-api-openai-api-latency-optimization-context-.text`	A text example demonstrating how reducing message history and metadata can optimize API latency.	Exact payloads, commands, or snippets shown in A text example demonstrating how reducing message history and metadata can optimize API latency.
`examples/latency-optimization-openai-api-system-prompt-2.text`	A text example demonstrating a system prompt designed to optimize latency by instructing the model to respond in a structured JSON format.	Exact payloads, commands, or snippets shown in A text example demonstrating a system prompt designed to optimize latency by instructing the model to respond in a st...
`examples/latency-optimization-openai-api-system-prompt-3.text`	A text example demonstrating a system prompt and structured reasoning fields used to optimize latency in a customer service bot context.	Exact payloads, commands, or snippets shown in A text example demonstrating a system prompt and structured reasoning fields used to optimize latency in a customer s...
`examples/latency-optimization-openai-api-context-window-management.text`	A text-based example demonstrating how to manage conversation history and metadata to optimize API latency.	Exact payloads, commands, or snippets shown in A text-based example demonstrating how to manage conversation history and metadata to optimize API latency.
`examples/latency-optimization-openai-api-request-payload.text`	A text representation of a structured request payload containing user query sentiment and hardware issue metadata for latency optimization testing.	Exact payloads, commands, or snippets shown in A text representation of a structured request payload containing user query sentiment and hardware issue metadata for...

What This Skill Covers

User: "My computer screen is cracked! I want it fixed now!!!"
Main sections: Analysis and optimizations, Part 1: Looking at retrieval prompts, Part 2: Analyzing the assistant prompt, Part 3: Optimizing the structured output, Example wrap-up.

Workflow

Open the most relevant file under docs/ for the exact documented workflow and wording.
Open schemas/ files for exact structured contracts.
Open examples/ files for concrete requests, commands, snippets, and manifests.
Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://developers.openai.com/api/docs/guides/latency-optimization

Skill metadata

Name: Latency optimization | OpenAI API
Author: Bruno HANSS - Prompt Buddy
Generation mode: Ai Assisted Human Authored
Source count: 1

Provenance

Source program: OpenAI Platform Docs
Last generated: May 11, 2026
Last source sync: Unknown
Source pages: 1

Safety model

Canonical source pages are preserved separately. Derived files record source evidence and require zero AI-generated facts.

File tree

Source links

https://developers.openai.com/api/docs/guides/latency-optimization Back to skills