openai · OpenAI Platform Docs

Audio and speech | OpenAI API

Explains how to implement audio-related capabilities including speech-to-text transcription, text-to-speech synthesis, and real-time audio interactions using OpenAI models.

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Audio and speech | OpenAI API

Explains how to implement audio-related capabilities including speech-to-text transcription, text-to-speech synthesis, and real-time audio interactions using OpenAI models.

When To Use

Use when implementing features such as automated transcription, voice synthesis, or real-time conversational audio interfaces.

Reference Files

File	Contains	Use For
`SKILL.md`	Entry point: scope, routing table, and workflow.	Start here.
`docs/audio-and-speech-openai-api-workflow-guide.md`	A guide explaining audio modalities, speech tasks, streaming, and the differences between request-based APIs and realtime sessions for OpenAI audio models.	Questions about a guide explaining audio modalities, speech tasks, streaming, and the differences between request-based APIs and real...
`examples/audio-and-speech-openai-api-openai-realtime-agent-session-nodejs.text`	A JavaScript code example demonstrating how to initialize a RealtimeAgent and connect to a RealtimeSession using the OpenAI API.	Exact payloads, commands, or snippets shown in A JavaScript code example demonstrating how to initialize a RealtimeAgent and connect to a RealtimeSession using the...
`examples/audio-and-speech-openai-api-openai-api-chat-completions-audio-modalities.text`	A Node.js code example demonstrating how to generate an audio response using the chat completions endpoint with text and audio modalities.	Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to generate an audio response using the chat completions endpoint with text...
`examples/audio-and-speech-openai-api-openai-api-gpt-audio-multimodal-python.text`	A Python code example demonstrating how to use the gpt-audio model with text and audio modalities to generate a response.	Exact payloads, commands, or snippets shown in A Python code example demonstrating how to use the gpt-audio model with text and audio modalities to generate a respo...
`examples/audio-and-speech-openai-api-openai-api-chat-completions-audio-modalities-2.text`	A curl command demonstrating how to request text and audio modalities using the gpt-audio model via the OpenAI Chat Completions API.	Exact payloads, commands, or snippets shown in A curl command demonstrating how to request text and audio modalities using the gpt-audio model via the OpenAI Chat C...
`examples/audio-and-speech-openai-api-openai-api-audio-transcription-nodejs.text`	A Node.js code example demonstrating how to fetch an audio file and prepare it for transcription using the OpenAI API.	Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to fetch an audio file and prepare it for transcription using the OpenAI API.
`examples/audio-and-speech-openai-api-openai-api-audio-transcription-python.text`	A Python script demonstrating how to fetch an audio file and prepare it for transcription using the OpenAI client.	Exact payloads, commands, or snippets shown in A Python script demonstrating how to fetch an audio file and prepare it for transcription using the OpenAI client.
`examples/audio-and-speech-openai-api-openai-api-chat-completions-audio-modalities-3.text`	A curl command demonstrating how to request text and audio modalities using the gpt-audio model via the OpenAI Chat Completions API.	Exact payloads, commands, or snippets shown in A curl command demonstrating how to request text and audio modalities using the gpt-audio model via the OpenAI Chat C...

What This Skill Covers

Audio models can understand spoken input, generate spoken output, or do both in the same interaction. This guide explains the vocabulary used across OpenAI’s...
Main sections: Audio modalities, Common speech tasks, Streaming and latency, Request-based APIs and realtime sessions, Add audio to your existing application.

Workflow

Open the most relevant file under docs/ for the exact documented workflow and wording.
Open schemas/ files for exact structured contracts.
Open examples/ files for concrete requests, commands, snippets, and manifests.
Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://developers.openai.com/api/docs/guides/audio

Skill metadata

Name: Audio and speech | OpenAI API
Author: Bruno HANSS - Prompt Buddy
Generation mode: Ai Assisted Human Authored
Source count: 1

Provenance

Source program: OpenAI Platform Docs
Last generated: May 11, 2026
Last source sync: Unknown
Source pages: 1

Safety model

Canonical source pages are preserved separately. Derived files record source evidence and require zero AI-generated facts.

File tree

Source links

https://developers.openai.com/api/docs/guides/audio Back to skills