openai · OpenAI Platform Docs
Direct preference optimization | OpenAI API
Explains the implementation and application of Direct Preference Optimization (DPO) to fine-tune models based on human preference data.
Derived skill
Files assembled from official documentation
Viewing SKILL.md
Direct preference optimization | OpenAI API
Explains the implementation and application of Direct Preference Optimization (DPO) to fine-tune models based on human preference data.
When To Use
Use when you need to align a model's outputs with specific human preferences using a preference dataset instead of traditional RLHF methods.
Reference Files
| File | Contains | Use For |
|---|---|---|
SKILL.md | Entry point: scope, routing table, and workflow. | Start here. |
docs/direct-preference-optimization-openai-api-workflow-guide.md | A guide explaining how to use Direct Preference Optimization (DPO) to fine-tune models using prompt-response pairs. | Questions about a guide explaining how to use Direct Preference Optimization (DPO) to fine-tune models using prompt-response pairs. |
examples/direct-preference-optimization-openai-api-request-payload.text | A text representation of a JSON request payload for the Direct Preference Optimization API including user messages and preferred assistant outputs. | Exact payloads, commands, or snippets shown in A text representation of a JSON request payload for the Direct Preference Optimization API including user messages an... |
examples/direct-preference-optimization-openai-api-openai-api-dpo-fine-tuning-job.text | A JavaScript code snippet demonstrating how to create a fine-tuning job using the Direct Preference Optimization method via the OpenAI API. | Exact payloads, commands, or snippets shown in A JavaScript code snippet demonstrating how to create a fine-tuning job using the Direct Preference Optimization meth... |
examples/direct-preference-optimization-openai-api-openai-api-dpo-finetuning-job-.text | A Python code example demonstrating how to create a fine-tuning job using the Direct Preference Optimization (DPO) method via the OpenAI client. | Exact payloads, commands, or snippets shown in A Python code example demonstrating how to create a fine-tuning job using the Direct Preference Optimization (DPO) me... |
What This Skill Covers
- Direct Preference Optimization (DPO) fine-tuning allows you to fine-tune models based on prompts and pairs of responses. This approach enables the model to l...
- Main sections:
Data format,Create a DPO fine-tune job,Use SFT and DPO together,Safety checks,Next steps.
Workflow
- Open the most relevant file under
docs/for the exact documented workflow and wording. - Open
schemas/files for exact structured contracts. - Open
examples/files for concrete requests, commands, snippets, and manifests. - Do not add behavior or configuration that is not present in the attached source files.
Canonical source: https://developers.openai.com/api/docs/guides/direct-preference-optimization
