openai · OpenAI Platform Docs

Audio and speech

Explains the different audio modalities, speech tasks, and architectural patterns including request-based APIs versus realtime streaming sessions.

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Audio and speech

Explains the different audio modalities, speech tasks, and architectural patterns including request-based APIs versus realtime streaming sessions.

When To Use

Use when deciding whether to implement request-based APIs for file processing or realtime sessions for low-latency conversational voice agents.

Reference Files

File	Contains	Use For
`SKILL.md`	Entry point: scope, routing table, and workflow.	Start here.
`docs/audio-and-speech-workflow-guide.md`	A guide explaining audio modalities, speech tasks, streaming, and API implementation paths for OpenAI audio models.	Questions about a guide explaining audio modalities, speech tasks, streaming, and API implementation paths for OpenAI audio models.
`examples/audio-and-speech-openai-realtime-agent-session.javascript`	A JavaScript implementation demonstrating how to initialize a RealtimeAgent and connect a RealtimeSession using the OpenAI agents library.	Exact payloads, commands, or snippets shown in A JavaScript implementation demonstrating how to initialize a RealtimeAgent and connect a RealtimeSession using the O...
`examples/audio-and-speech-openai-audio-chat-completions-nodejs.javascript`	A Node.js script demonstrating how to use the OpenAI API to generate audio responses using the chat completions endpoint with specific modalities and voice settings.	Exact payloads, commands, or snippets shown in A Node.js script demonstrating how to use the OpenAI API to generate audio responses using the chat completions endpo...
`examples/audio-and-speech-openai-audio-speech-gpt-audio.python`	A Python script demonstrating how to use the gpt-audio model to generate text and audio responses using the OpenAI client.	Exact payloads, commands, or snippets shown in A Python script demonstrating how to use the gpt-audio model to generate text and audio responses using the OpenAI cl...
`examples/audio-and-speech-openai-chat-completions-audio-modalities-curl.bash`	A curl command demonstrating how to request text and audio modalities using the gpt-audio model.	Exact payloads, commands, or snippets shown in A curl command demonstrating how to request text and audio modalities using the gpt-audio model.
`examples/audio-and-speech-openai-audio-transcription.javascript`	A JavaScript code example demonstrating how to fetch an audio file and send it to the OpenAI API for transcription.	Exact payloads, commands, or snippets shown in A JavaScript code example demonstrating how to fetch an audio file and send it to the OpenAI API for transcription.
`examples/audio-and-speech-openai-audio-speech-python-base64-encoding.python`	A Python script demonstrating how to fetch an audio file and encode it into a base64 string for use with the OpenAI API.	Exact payloads, commands, or snippets shown in A Python script demonstrating how to fetch an audio file and encode it into a base64 string for use with the OpenAI API.
`examples/audio-and-speech-openai-audio-chat-completions-curl-request.bash`	A curl command demonstrating how to send a request to the chat completions endpoint with audio modalities enabled.	Exact payloads, commands, or snippets shown in A curl command demonstrating how to send a request to the chat completions endpoint with audio modalities enabled.

What This Skill Covers

Audio models can understand spoken input, generate spoken output, or do both in the same interaction. This guide explains the vocabulary used across OpenAI's...
Main sections: Audio modalities, Common speech tasks, Streaming and latency, Request-based APIs and realtime sessions, Add audio to your existing application.

Workflow

Open the most relevant file under docs/ for the exact documented workflow and wording.
Open schemas/ files for exact structured contracts.
Open examples/ files for concrete requests, commands, snippets, and manifests.
Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://developers.openai.com/api/docs/guides/audio.md

Skill metadata

Name: Audio and speech
Author: Bruno HANSS - Prompt Buddy
Generation mode: Ai Assisted Human Authored
Source count: 1

Provenance

Source program: OpenAI Platform Docs
Last generated: May 10, 2026
Last source sync: Unknown
Source pages: 1

Safety model

Canonical source pages are preserved separately. Derived files record source evidence and require zero AI-generated facts.

File tree

Source links

https://developers.openai.com/api/docs/guides/audio.md Back to skills