openai · OpenAI Platform Docs
Audio and speech | OpenAI API
Explains how to implement audio-related capabilities including speech-to-text transcription, text-to-speech synthesis, and real-time audio interactions using OpenAI models.
Derived skill
Files assembled from official documentation
Viewing SKILL.md
Audio and speech | OpenAI API
Explains how to implement audio-related capabilities including speech-to-text transcription, text-to-speech synthesis, and real-time audio interactions using OpenAI models.
When To Use
Use when implementing features such as automated transcription, voice synthesis, or real-time conversational audio interfaces.
Reference Files
| File | Contains | Use For |
|---|---|---|
SKILL.md | Entry point: scope, routing table, and workflow. | Start here. |
docs/audio-and-speech-openai-api-workflow-guide.md | A guide explaining audio modalities, speech tasks, streaming, and the differences between request-based APIs and realtime sessions for OpenAI audio models. | Questions about a guide explaining audio modalities, speech tasks, streaming, and the differences between request-based APIs and real... |
examples/audio-and-speech-openai-api-openai-realtime-agent-session-nodejs.text | A JavaScript code example demonstrating how to initialize a RealtimeAgent and connect to a RealtimeSession using the OpenAI API. | Exact payloads, commands, or snippets shown in A JavaScript code example demonstrating how to initialize a RealtimeAgent and connect to a RealtimeSession using the... |
examples/audio-and-speech-openai-api-openai-api-chat-completions-audio-modalities.text | A Node.js code example demonstrating how to generate an audio response using the chat completions endpoint with text and audio modalities. | Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to generate an audio response using the chat completions endpoint with text... |
examples/audio-and-speech-openai-api-openai-api-gpt-audio-multimodal-python.text | A Python code example demonstrating how to use the gpt-audio model with text and audio modalities to generate a response. | Exact payloads, commands, or snippets shown in A Python code example demonstrating how to use the gpt-audio model with text and audio modalities to generate a respo... |
examples/audio-and-speech-openai-api-openai-api-chat-completions-audio-modalities-2.text | A curl command demonstrating how to request text and audio modalities using the gpt-audio model via the OpenAI Chat Completions API. | Exact payloads, commands, or snippets shown in A curl command demonstrating how to request text and audio modalities using the gpt-audio model via the OpenAI Chat C... |
examples/audio-and-speech-openai-api-openai-api-audio-transcription-nodejs.text | A Node.js code example demonstrating how to fetch an audio file and prepare it for transcription using the OpenAI API. | Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to fetch an audio file and prepare it for transcription using the OpenAI API. |
examples/audio-and-speech-openai-api-openai-api-audio-transcription-python.text | A Python script demonstrating how to fetch an audio file and prepare it for transcription using the OpenAI client. | Exact payloads, commands, or snippets shown in A Python script demonstrating how to fetch an audio file and prepare it for transcription using the OpenAI client. |
examples/audio-and-speech-openai-api-openai-api-chat-completions-audio-modalities-3.text | A curl command demonstrating how to request text and audio modalities using the gpt-audio model via the OpenAI Chat Completions API. | Exact payloads, commands, or snippets shown in A curl command demonstrating how to request text and audio modalities using the gpt-audio model via the OpenAI Chat C... |
What This Skill Covers
- Audio models can understand spoken input, generate spoken output, or do both in the same interaction. This guide explains the vocabulary used across OpenAI’s...
- Main sections:
Audio modalities,Common speech tasks,Streaming and latency,Request-based APIs and realtime sessions,Add audio to your existing application.
Workflow
- Open the most relevant file under
docs/for the exact documented workflow and wording. - Open
schemas/files for exact structured contracts. - Open
examples/files for concrete requests, commands, snippets, and manifests. - Do not add behavior or configuration that is not present in the attached source files.
Canonical source: https://developers.openai.com/api/docs/guides/audio
