Prompt Buddy logoPrompt Buddy

openai · OpenAI Platform Docs

Reinforcement fine-tuning

A guide on implementing reinforcement fine-tuning (RFT) for reasoning models by defining a programmable grader, preparing datasets, and managing the training lifecycle.

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Reinforcement fine-tuning

A guide on implementing reinforcement fine-tuning (RFT) for reasoning models by defining a programmable grader, preparing datasets, and managing the training lifecycle.

When To Use

Use when you need to adapt a reasoning model to complex, domain-specific tasks using a programmable reward signal instead of fixed correct answers.

Reference Files

FileContainsUse For
SKILL.mdEntry point: scope, routing table, and workflow.Start here.
docs/reinforcement-fine-tuning-workflow-guide.mdA guide explaining how to adapt OpenAI reasoning models using custom feedback signals and grader definitions.Questions about a guide explaining how to adapt OpenAI reasoning models using custom feedback signals and grader definitions.
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning.textA text-based example demonstrating the reinforcement fine-tuning process for OpenAI models.Exact payloads, commands, or snippets shown in A text-based example demonstrating the reinforcement fine-tuning process for OpenAI models.
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-dataset.jsonA JSON formatted dataset example containing compliant responses and explanations for reinforcement fine-tuning.Exact payloads, commands, or snippets shown in A JSON formatted dataset example containing compliant responses and explanations for reinforcement fine-tuning.
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-messages-form.jsonA JSON object demonstrating the message structure and field requirements for reinforcement fine-tuning datasets.Exact payloads, commands, or snippets shown in A JSON object demonstrating the message structure and field requirements for reinforcement fine-tuning datasets.
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-dataset-forma.textA text file containing a JSONL-formatted dataset of user messages, compliance labels, and explanations for reinforcement fine-tuning.Exact payloads, commands, or snippets shown in A text file containing a JSONL-formatted dataset of user messages, compliance labels, and explanations for reinforcem...
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-dataset-forma-2.textA text file containing a JSONL-formatted dataset for reinforcement fine-tuning featuring user messages, compliance status, and explanations.Exact payloads, commands, or snippets shown in A text file containing a JSONL-formatted dataset for reinforcement fine-tuning featuring user messages, compliance st...
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-jsonschema.jsonA JSON schema definition for a security assistant used in reinforcement fine-tuning examples.Exact payloads, commands, or snippets shown in A JSON schema definition for a security assistant used in reinforcement fine-tuning examples.
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-python-pydant.pythonA Python script demonstrating how to use to_strict_json_schema to convert a Pydantic model into a JSON schema for reinforcement fine-tuning.Exact payloads, commands, or snippets shown in A Python script demonstrating how to use tostrictjsonschema to convert a Pydantic model into a JSON schema for reinfo...
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-job-creation-.bashA curl command demonstrating how to create a reinforcement fine-tuning job using the OpenAI API.Exact payloads, commands, or snippets shown in A curl command demonstrating how to create a reinforcement fine-tuning job using the OpenAI API.
examples/reinforcement-fine-tuning-openai-reinforcement-fine-tuning-job-event-log.jsonA JSON object representing a real-time event log from a reinforcement fine-tuning job, including training metrics and reward scores.Exact payloads, commands, or snippets shown in A JSON object representing a real-time event log from a reinforcement fine-tuning job, including training metrics and...

What This Skill Covers

  • Reinforcement fine-tuning (RFT) adapts an OpenAI reasoning model with a feedback signal you define. Like supervised fine-tuning, it tailors the model to your...
  • Main sections: Example: LLM-powered security review, Define a grader, Prepare your dataset, Upload your files, Create a fine-tune job.

Workflow

  1. Open the most relevant file under docs/ for the exact documented workflow and wording.
  2. Open schemas/ files for exact structured contracts.
  3. Open examples/ files for concrete requests, commands, snippets, and manifests.
  4. Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://developers.openai.com/api/docs/guides/reinforcement-fine-tuning.md