Prompt Buddy logoPrompt Buddy

openai · OpenAI Platform Docs

Direct preference optimization | OpenAI API

Explains the implementation and application of Direct Preference Optimization (DPO) to fine-tune models based on human preference data.

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Direct preference optimization | OpenAI API

Explains the implementation and application of Direct Preference Optimization (DPO) to fine-tune models based on human preference data.

When To Use

Use when you need to align a model's outputs with specific human preferences using a preference dataset instead of traditional RLHF methods.

Reference Files

FileContainsUse For
SKILL.mdEntry point: scope, routing table, and workflow.Start here.
docs/direct-preference-optimization-openai-api-workflow-guide.mdA guide explaining how to use Direct Preference Optimization (DPO) to fine-tune models using prompt-response pairs.Questions about a guide explaining how to use Direct Preference Optimization (DPO) to fine-tune models using prompt-response pairs.
examples/direct-preference-optimization-openai-api-request-payload.textA text representation of a JSON request payload for the Direct Preference Optimization API including user messages and preferred assistant outputs.Exact payloads, commands, or snippets shown in A text representation of a JSON request payload for the Direct Preference Optimization API including user messages an...
examples/direct-preference-optimization-openai-api-openai-api-dpo-fine-tuning-job.textA JavaScript code snippet demonstrating how to create a fine-tuning job using the Direct Preference Optimization method via the OpenAI API.Exact payloads, commands, or snippets shown in A JavaScript code snippet demonstrating how to create a fine-tuning job using the Direct Preference Optimization meth...
examples/direct-preference-optimization-openai-api-openai-api-dpo-finetuning-job-.textA Python code example demonstrating how to create a fine-tuning job using the Direct Preference Optimization (DPO) method via the OpenAI client.Exact payloads, commands, or snippets shown in A Python code example demonstrating how to create a fine-tuning job using the Direct Preference Optimization (DPO) me...

What This Skill Covers

  • Direct Preference Optimization (DPO) fine-tuning allows you to fine-tune models based on prompts and pairs of responses. This approach enables the model to l...
  • Main sections: Data format, Create a DPO fine-tune job, Use SFT and DPO together, Safety checks, Next steps.

Workflow

  1. Open the most relevant file under docs/ for the exact documented workflow and wording.
  2. Open schemas/ files for exact structured contracts.
  3. Open examples/ files for concrete requests, commands, snippets, and manifests.
  4. Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://developers.openai.com/api/docs/guides/direct-preference-optimization