openai · OpenAI Platform Docs

Direct preference optimization

Teaches how to implement Direct Preference Optimization (DPO) fine-tuning by formatting datasets with preferred and non-preferred response pairs, configuring the DPO method via the API, and combining it with supervise...

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Direct preference optimization

When To Use

Use when you need to align a model to subjective human preferences, such as specific tones, styles, or summarization qualities, using prompt-response pairs.

Reference Files

File	Contains	Use For
`SKILL.md`	Entry point: scope, routing table, and workflow.	Start here.
`docs/direct-preference-optimization-workflow-guide.md`	A guide explaining how to perform Direct Preference Optimization (DPO) fine-tuning using prompt-response pairs and safety checks.	Questions about a guide explaining how to perform Direct Preference Optimization (DPO) fine-tuning using prompt-response pairs and sa...
`examples/direct-preference-optimization-openai-direct-preference-optimization-dat.json`	A JSON object demonstrating the required message and preferred output structure for direct preference optimization training data.	Exact payloads, commands, or snippets shown in A JSON object demonstrating the required message and preferred output structure for direct preference optimization tr...
`examples/direct-preference-optimization-openai-dpo-fine-tuning-job-creation.javascript`	A JavaScript code example demonstrating how to create a fine-tuning job using the Direct Preference Optimization (DPO) method via the OpenAI SDK.	Exact payloads, commands, or snippets shown in A JavaScript code example demonstrating how to create a fine-tuning job using the Direct Preference Optimization (DPO...
`examples/direct-preference-optimization-openai-direct-preference-optimization-pyt.python`	A Python script demonstrating how to create a fine-tuning job using the Direct Preference Optimization (DPO) method via the OpenAI client.	Exact payloads, commands, or snippets shown in A Python script demonstrating how to create a fine-tuning job using the Direct Preference Optimization (DPO) method v...

What This Skill Covers

Direct Preference Optimization (DPO) fine-tuning allows you to fine-tune models based on prompts and pairs of responses. This approach enables the model to l...
Main sections: Data format, Create a DPO fine-tune job, Use SFT and DPO together, Safety checks, Next steps.

Workflow

Open the most relevant file under docs/ for the exact documented workflow and wording.
Open schemas/ files for exact structured contracts.
Open examples/ files for concrete requests, commands, snippets, and manifests.
Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://developers.openai.com/api/docs/guides/direct-preference-optimization.md

Skill metadata

Name: Direct preference optimization
Author: Bruno HANSS - Prompt Buddy
Generation mode: Ai Assisted Human Authored
Source count: 1

Provenance

Source program: OpenAI Platform Docs
Last generated: May 10, 2026
Last source sync: Unknown
Source pages: 1

Safety model

Canonical source pages are preserved separately. Derived files record source evidence and require zero AI-generated facts.

File tree

Source links

https://developers.openai.com/api/docs/guides/direct-preference-optimization.md Back to skills