openai · OpenAI Platform Docs
Reinforcement fine-tuning
A guide on implementing reinforcement fine-tuning (RFT) for reasoning models by defining a programmable grader, preparing datasets, and managing the training lifecycle.
Derived skill
Files assembled from official documentation
Viewing SKILL.md
Reinforcement fine-tuning
A guide on implementing reinforcement fine-tuning (RFT) for reasoning models by defining a programmable grader, preparing datasets, and managing the training lifecycle.
When To Use
Use when you need to adapt a reasoning model to complex, domain-specific tasks using a programmable reward signal instead of fixed correct answers.
Reference Files
| File | Contains | Use For |
|---|---|---|
SKILL.md | Entry point: scope, routing table, and workflow. | Start here. |
docs/reinforcement-fine-tuning-workflow-guide.md | A guide explaining how to adapt OpenAI reasoning models using custom feedback signals and grader definitions. | Questions about a guide explaining how to adapt OpenAI reasoning models using custom feedback signals and grader definitions. |
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning.text | A text-based example demonstrating the reinforcement fine-tuning process for OpenAI models. | Exact payloads, commands, or snippets shown in A text-based example demonstrating the reinforcement fine-tuning process for OpenAI models. |
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-dataset.json | A JSON formatted dataset example containing compliant responses and explanations for reinforcement fine-tuning. | Exact payloads, commands, or snippets shown in A JSON formatted dataset example containing compliant responses and explanations for reinforcement fine-tuning. |
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-messages-form.json | A JSON object demonstrating the message structure and field requirements for reinforcement fine-tuning datasets. | Exact payloads, commands, or snippets shown in A JSON object demonstrating the message structure and field requirements for reinforcement fine-tuning datasets. |
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-dataset-forma.text | A text file containing a JSONL-formatted dataset of user messages, compliance labels, and explanations for reinforcement fine-tuning. | Exact payloads, commands, or snippets shown in A text file containing a JSONL-formatted dataset of user messages, compliance labels, and explanations for reinforcem... |
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-dataset-forma-2.text | A text file containing a JSONL-formatted dataset for reinforcement fine-tuning featuring user messages, compliance status, and explanations. | Exact payloads, commands, or snippets shown in A text file containing a JSONL-formatted dataset for reinforcement fine-tuning featuring user messages, compliance st... |
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-jsonschema.json | A JSON schema definition for a security assistant used in reinforcement fine-tuning examples. | Exact payloads, commands, or snippets shown in A JSON schema definition for a security assistant used in reinforcement fine-tuning examples. |
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-python-pydant.python | A Python script demonstrating how to use to_strict_json_schema to convert a Pydantic model into a JSON schema for reinforcement fine-tuning. | Exact payloads, commands, or snippets shown in A Python script demonstrating how to use tostrictjsonschema to convert a Pydantic model into a JSON schema for reinfo... |
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-job-creation-.bash | A curl command demonstrating how to create a reinforcement fine-tuning job using the OpenAI API. | Exact payloads, commands, or snippets shown in A curl command demonstrating how to create a reinforcement fine-tuning job using the OpenAI API. |
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-job-event-log.json | A JSON object representing a real-time event log from a reinforcement fine-tuning job, including training metrics and reward scores. | Exact payloads, commands, or snippets shown in A JSON object representing a real-time event log from a reinforcement fine-tuning job, including training metrics and... |
What This Skill Covers
- Reinforcement fine-tuning (RFT) adapts an OpenAI reasoning model with a feedback signal you define. Like supervised fine-tuning, it tailors the model to your...
- Main sections:
Example: LLM-powered security review,Define a grader,Prepare your dataset,Upload your files,Create a fine-tune job.
Workflow
- Open the most relevant file under
docs/for the exact documented workflow and wording. - Open
schemas/files for exact structured contracts. - Open
examples/files for concrete requests, commands, snippets, and manifests. - Do not add behavior or configuration that is not present in the attached source files.
Canonical source: https://developers.openai.com/api/docs/guides/reinforcement-fine-tuning.md
