openai · OpenAI Platform Docs
Audio and speech
Explains the different audio modalities, speech tasks, and architectural patterns including request-based APIs versus realtime streaming sessions.
Derived skill
Files assembled from official documentation
Viewing SKILL.md
Audio and speech
Explains the different audio modalities, speech tasks, and architectural patterns including request-based APIs versus realtime streaming sessions.
When To Use
Use when deciding whether to implement request-based APIs for file processing or realtime sessions for low-latency conversational voice agents.
Reference Files
| File | Contains | Use For |
|---|---|---|
SKILL.md | Entry point: scope, routing table, and workflow. | Start here. |
docs/audio-and-speech-workflow-guide.md | A guide explaining audio modalities, speech tasks, streaming, and API implementation paths for OpenAI audio models. | Questions about a guide explaining audio modalities, speech tasks, streaming, and API implementation paths for OpenAI audio models. |
examples/audio-and-speech-openai-realtime-agent-session.javascript | A JavaScript implementation demonstrating how to initialize a RealtimeAgent and connect a RealtimeSession using the OpenAI agents library. | Exact payloads, commands, or snippets shown in A JavaScript implementation demonstrating how to initialize a RealtimeAgent and connect a RealtimeSession using the O... |
examples/audio-and-speech-openai-audio-chat-completions-nodejs.javascript | A Node.js script demonstrating how to use the OpenAI API to generate audio responses using the chat completions endpoint with specific modalities and voice settings. | Exact payloads, commands, or snippets shown in A Node.js script demonstrating how to use the OpenAI API to generate audio responses using the chat completions endpo... |
examples/audio-and-speech-openai-audio-speech-gpt-audio.python | A Python script demonstrating how to use the gpt-audio model to generate text and audio responses using the OpenAI client. | Exact payloads, commands, or snippets shown in A Python script demonstrating how to use the gpt-audio model to generate text and audio responses using the OpenAI cl... |
examples/audio-and-speech-openai-chat-completions-audio-modalities-curl.bash | A curl command demonstrating how to request text and audio modalities using the gpt-audio model. | Exact payloads, commands, or snippets shown in A curl command demonstrating how to request text and audio modalities using the gpt-audio model. |
examples/audio-and-speech-openai-audio-transcription.javascript | A JavaScript code example demonstrating how to fetch an audio file and send it to the OpenAI API for transcription. | Exact payloads, commands, or snippets shown in A JavaScript code example demonstrating how to fetch an audio file and send it to the OpenAI API for transcription. |
examples/audio-and-speech-openai-audio-speech-python-base64-encoding.python | A Python script demonstrating how to fetch an audio file and encode it into a base64 string for use with the OpenAI API. | Exact payloads, commands, or snippets shown in A Python script demonstrating how to fetch an audio file and encode it into a base64 string for use with the OpenAI API. |
examples/audio-and-speech-openai-audio-chat-completions-curl-request.bash | A curl command demonstrating how to send a request to the chat completions endpoint with audio modalities enabled. | Exact payloads, commands, or snippets shown in A curl command demonstrating how to send a request to the chat completions endpoint with audio modalities enabled. |
What This Skill Covers
- Audio models can understand spoken input, generate spoken output, or do both in the same interaction. This guide explains the vocabulary used across OpenAI's...
- Main sections:
Audio modalities,Common speech tasks,Streaming and latency,Request-based APIs and realtime sessions,Add audio to your existing application.
Workflow
- Open the most relevant file under
docs/for the exact documented workflow and wording. - Open
schemas/files for exact structured contracts. - Open
examples/files for concrete requests, commands, snippets, and manifests. - Do not add behavior or configuration that is not present in the attached source files.
Canonical source: https://developers.openai.com/api/docs/guides/audio.md
