Prompt Buddy logoPrompt Buddy

openai · OpenAI Platform Docs

Audio and speech

Explains the different audio modalities, speech tasks, and architectural patterns including request-based APIs versus realtime streaming sessions.

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Audio and speech

Explains the different audio modalities, speech tasks, and architectural patterns including request-based APIs versus realtime streaming sessions.

When To Use

Use when deciding whether to implement request-based APIs for file processing or realtime sessions for low-latency conversational voice agents.

Reference Files

FileContainsUse For
SKILL.mdEntry point: scope, routing table, and workflow.Start here.
docs/audio-and-speech-workflow-guide.mdA guide explaining audio modalities, speech tasks, streaming, and API implementation paths for OpenAI audio models.Questions about a guide explaining audio modalities, speech tasks, streaming, and API implementation paths for OpenAI audio models.
examples/audio-and-speech-openai-realtime-agent-session.javascriptA JavaScript implementation demonstrating how to initialize a RealtimeAgent and connect a RealtimeSession using the OpenAI agents library.Exact payloads, commands, or snippets shown in A JavaScript implementation demonstrating how to initialize a RealtimeAgent and connect a RealtimeSession using the O...
examples/audio-and-speech-openai-audio-chat-completions-nodejs.javascriptA Node.js script demonstrating how to use the OpenAI API to generate audio responses using the chat completions endpoint with specific modalities and voice settings.Exact payloads, commands, or snippets shown in A Node.js script demonstrating how to use the OpenAI API to generate audio responses using the chat completions endpo...
examples/audio-and-speech-openai-audio-speech-gpt-audio.pythonA Python script demonstrating how to use the gpt-audio model to generate text and audio responses using the OpenAI client.Exact payloads, commands, or snippets shown in A Python script demonstrating how to use the gpt-audio model to generate text and audio responses using the OpenAI cl...
examples/audio-and-speech-openai-chat-completions-audio-modalities-curl.bashA curl command demonstrating how to request text and audio modalities using the gpt-audio model.Exact payloads, commands, or snippets shown in A curl command demonstrating how to request text and audio modalities using the gpt-audio model.
examples/audio-and-speech-openai-audio-transcription.javascriptA JavaScript code example demonstrating how to fetch an audio file and send it to the OpenAI API for transcription.Exact payloads, commands, or snippets shown in A JavaScript code example demonstrating how to fetch an audio file and send it to the OpenAI API for transcription.
examples/audio-and-speech-openai-audio-speech-python-base64-encoding.pythonA Python script demonstrating how to fetch an audio file and encode it into a base64 string for use with the OpenAI API.Exact payloads, commands, or snippets shown in A Python script demonstrating how to fetch an audio file and encode it into a base64 string for use with the OpenAI API.
examples/audio-and-speech-openai-audio-chat-completions-curl-request.bashA curl command demonstrating how to send a request to the chat completions endpoint with audio modalities enabled.Exact payloads, commands, or snippets shown in A curl command demonstrating how to send a request to the chat completions endpoint with audio modalities enabled.

What This Skill Covers

  • Audio models can understand spoken input, generate spoken output, or do both in the same interaction. This guide explains the vocabulary used across OpenAI's...
  • Main sections: Audio modalities, Common speech tasks, Streaming and latency, Request-based APIs and realtime sessions, Add audio to your existing application.

Workflow

  1. Open the most relevant file under docs/ for the exact documented workflow and wording.
  2. Open schemas/ files for exact structured contracts.
  3. Open examples/ files for concrete requests, commands, snippets, and manifests.
  4. Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://developers.openai.com/api/docs/guides/audio.md