openai · OpenAI Platform Docs
Latency optimization
A guide outlining seven architectural and operational principles to reduce latency in LLM applications, including token management, request parallelization, and model selection strategies.
Derived skill
Files assembled from official documentation
Viewing SKILL.md
Latency optimization
A guide outlining seven architectural and operational principles to reduce latency in LLM applications, including token management, request parallelization, and model selection strategies.
When To Use
Use when you need to reduce the response time of an LLM-powered application through architectural changes, prompt engineering, or request management.
Reference Files
| File | Contains | Use For |
|---|---|---|
SKILL.md | Entry point: scope, routing table, and workflow. | Start here. |
docs/latency-optimization-workflow-guide.md | A guide outlining seven principles and techniques to reduce latency in LLM-related use cases, including token management and input optimization. | Questions about a guide outlining seven principles and techniques to reduce latency in LLM-related use cases, including token managem... |
examples/latency-optimization-openai-latency-optimization-chat-prompt-rewriting.examplechat | A chat-based prompt example demonstrating how to re-write user queries with context to optimize latency. | Exact payloads, commands, or snippets shown in A chat-based prompt example demonstrating how to re-write user queries with context to optimize latency. |
examples/latency-optimization-openai-latency-optimization-chat.examplechat | A chat-based example demonstrating a system prompt designed to classify user queries for real-time lookup requirements to optimize latency. | Exact payloads, commands, or snippets shown in A chat-based example demonstrating a system prompt designed to classify user queries for real-time lookup requirement... |
examples/latency-optimization-openai-latency-optimization-chat-2.examplechat | A chat-based example demonstrating system instructions and structured JSON responses for latency-sensitive customer service interactions. | Exact payloads, commands, or snippets shown in A chat-based example demonstrating system instructions and structured JSON responses for latency-sensitive customer s... |
examples/latency-optimization-openai-latency-optimization.jsx | A JSX code example demonstrating implementation strategies for reducing latency within the OpenAI platform. | Exact payloads, commands, or snippets shown in A JSX code example demonstrating implementation strategies for reducing latency within the OpenAI platform. |
examples/latency-optimization-openai-latency-optimization-chat-3.examplechat | An example chat conversation demonstrating system prompt instructions for query contextualization and retrieval determination to optimize latency. | Exact payloads, commands, or snippets shown in An example chat conversation demonstrating system prompt instructions for query contextualization and retrieval deter... |
examples/latency-optimization-openai-latency-optimization-jsx-metadata.jsx | A JSX code snippet demonstrating the use of metadata fields like message conversation continuation and user sentiment to optimize latency. | Exact payloads, commands, or snippets shown in A JSX code snippet demonstrating the use of metadata fields like message conversation continuation and user sentiment... |
examples/latency-optimization-openai-latency-optimization-chat-4.examplechat | A chat-based example demonstrating system prompt instructions for structured JSON responses to optimize latency. | Exact payloads, commands, or snippets shown in A chat-based example demonstrating system prompt instructions for structured JSON responses to optimize latency. |
examples/latency-optimization-openai-latency-optimization-chat-5.examplechat | A chat completion example demonstrating system instructions and reasoning fields for latency-optimized customer service responses. | Exact payloads, commands, or snippets shown in A chat completion example demonstrating system instructions and reasoning fields for latency-optimized customer servi... |
examples/latency-optimization-openai-latency-optimization-jsx-metadata-2.jsx | A JSX code snippet demonstrating the inclusion of conversation metadata to optimize latency in API requests. | Exact payloads, commands, or snippets shown in A JSX code snippet demonstrating the inclusion of conversation metadata to optimize latency in API requests. |
examples/latency-optimization-openai-latency-optimization-2.jsx | A JSX code example demonstrating how to structure metadata such as sentiment and query type to optimize response latency. | Exact payloads, commands, or snippets shown in A JSX code example demonstrating how to structure metadata such as sentiment and query type to optimize response late... |
What This Skill Covers
- This guide covers the core set of principles you can apply to improve latency across a wide variety of LLM-related use cases. These techniques come from work...
- Main sections:
Seven principles,Process tokens faster,Generate fewer tokens,Use fewer input tokens,Make fewer requests.
Workflow
- Open the most relevant file under
docs/for the exact documented workflow and wording. - Open
schemas/files for exact structured contracts. - Open
examples/files for concrete requests, commands, snippets, and manifests. - Do not add behavior or configuration that is not present in the attached source files.
Canonical source: https://developers.openai.com/api/docs/guides/latency-optimization.md
