Prompt Buddy logoPrompt Buddy

openai · OpenAI Platform Docs

Latency optimization

A guide outlining seven architectural and operational principles to reduce latency in LLM applications, including token management, request parallelization, and model selection strategies.

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Latency optimization

A guide outlining seven architectural and operational principles to reduce latency in LLM applications, including token management, request parallelization, and model selection strategies.

When To Use

Use when you need to reduce the response time of an LLM-powered application through architectural changes, prompt engineering, or request management.

Reference Files

FileContainsUse For
SKILL.mdEntry point: scope, routing table, and workflow.Start here.
docs/latency-optimization-workflow-guide.mdA guide outlining seven principles and techniques to reduce latency in LLM-related use cases, including token management and input optimization.Questions about a guide outlining seven principles and techniques to reduce latency in LLM-related use cases, including token managem...
examples/latency-optimization-openai-latency-optimization-chat-prompt-rewriting.examplechatA chat-based prompt example demonstrating how to re-write user queries with context to optimize latency.Exact payloads, commands, or snippets shown in A chat-based prompt example demonstrating how to re-write user queries with context to optimize latency.
examples/latency-optimization-openai-latency-optimization-chat.examplechatA chat-based example demonstrating a system prompt designed to classify user queries for real-time lookup requirements to optimize latency.Exact payloads, commands, or snippets shown in A chat-based example demonstrating a system prompt designed to classify user queries for real-time lookup requirement...
examples/latency-optimization-openai-latency-optimization-chat-2.examplechatA chat-based example demonstrating system instructions and structured JSON responses for latency-sensitive customer service interactions.Exact payloads, commands, or snippets shown in A chat-based example demonstrating system instructions and structured JSON responses for latency-sensitive customer s...
examples/latency-optimization-openai-latency-optimization.jsxA JSX code example demonstrating implementation strategies for reducing latency within the OpenAI platform.Exact payloads, commands, or snippets shown in A JSX code example demonstrating implementation strategies for reducing latency within the OpenAI platform.
examples/latency-optimization-openai-latency-optimization-chat-3.examplechatAn example chat conversation demonstrating system prompt instructions for query contextualization and retrieval determination to optimize latency.Exact payloads, commands, or snippets shown in An example chat conversation demonstrating system prompt instructions for query contextualization and retrieval deter...
examples/latency-optimization-openai-latency-optimization-jsx-metadata.jsxA JSX code snippet demonstrating the use of metadata fields like message conversation continuation and user sentiment to optimize latency.Exact payloads, commands, or snippets shown in A JSX code snippet demonstrating the use of metadata fields like message conversation continuation and user sentiment...
examples/latency-optimization-openai-latency-optimization-chat-4.examplechatA chat-based example demonstrating system prompt instructions for structured JSON responses to optimize latency.Exact payloads, commands, or snippets shown in A chat-based example demonstrating system prompt instructions for structured JSON responses to optimize latency.
examples/latency-optimization-openai-latency-optimization-chat-5.examplechatA chat completion example demonstrating system instructions and reasoning fields for latency-optimized customer service responses.Exact payloads, commands, or snippets shown in A chat completion example demonstrating system instructions and reasoning fields for latency-optimized customer servi...
examples/latency-optimization-openai-latency-optimization-jsx-metadata-2.jsxA JSX code snippet demonstrating the inclusion of conversation metadata to optimize latency in API requests.Exact payloads, commands, or snippets shown in A JSX code snippet demonstrating the inclusion of conversation metadata to optimize latency in API requests.
examples/latency-optimization-openai-latency-optimization-2.jsxA JSX code example demonstrating how to structure metadata such as sentiment and query type to optimize response latency.Exact payloads, commands, or snippets shown in A JSX code example demonstrating how to structure metadata such as sentiment and query type to optimize response late...

What This Skill Covers

  • This guide covers the core set of principles you can apply to improve latency across a wide variety of LLM-related use cases. These techniques come from work...
  • Main sections: Seven principles, Process tokens faster, Generate fewer tokens, Use fewer input tokens, Make fewer requests.

Workflow

  1. Open the most relevant file under docs/ for the exact documented workflow and wording.
  2. Open schemas/ files for exact structured contracts.
  3. Open examples/ files for concrete requests, commands, snippets, and manifests.
  4. Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://developers.openai.com/api/docs/guides/latency-optimization.md