openai · OpenAI Platform Docs
Reinforcement fine-tuning | OpenAI API
Explains the process and methodology for performing reinforcement fine-tuning to optimize model performance based on specific feedback or reward signals.
Derived skill
Files assembled from official documentation
Viewing SKILL.md
Reinforcement fine-tuning | OpenAI API
Explains the process and methodology for performing reinforcement fine-tuning to optimize model performance based on specific feedback or reward signals.
When To Use
Use when you need to implement a reinforcement learning workflow to fine-tune a model using preference data or reward models.
Reference Files
| File | Contains | Use For |
|---|---|---|
SKILL.md | Entry point: scope, routing table, and workflow. | Start here. |
docs/reinforcement-fine-tuning-openai-api-workflow-guide.md | A guide explaining how to adapt OpenAI reasoning models using reinforcement fine-tuning with custom feedback signals and grading criteria. | Questions about a guide explaining how to adapt OpenAI reasoning models using reinforcement fine-tuning with custom feedback signals... |
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-gu.text | A text-based guide explaining the concepts and implementation of reinforcement fine-tuning using the OpenAI API. | Exact payloads, commands, or snippets shown in A text-based guide explaining the concepts and implementation of reinforcement fine-tuning using the OpenAI API. |
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-tr.text | A text file demonstrating the structured training data format for reinforcement fine-tuning including compliant and explanation fields. | Exact payloads, commands, or snippets shown in A text file demonstrating the structured training data format for reinforcement fine-tuning including compliant and e... |
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-gr.text | A text configuration defining a multi-type grader with a gpt-4o score model for reinforcement fine-tuning evaluation. | Exact payloads, commands, or snippets shown in A text configuration defining a multi-type grader with a gpt-4o score model for reinforcement fine-tuning evaluation. |
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-ev.text | A text-based list of evaluation criteria used to assess the accuracy of model-generated answers during reinforcement fine-tuning. | Exact payloads, commands, or snippets shown in A text-based list of evaluation criteria used to assess the accuracy of model-generated answers during reinforcement... |
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-me.text | A text representation of the messages format used for reinforcement fine-tuning training data. | Exact payloads, commands, or snippets shown in A text representation of the messages format used for reinforcement fine-tuning training data. |
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-js.text | A JSONL formatted dataset containing user messages, compliance labels, and explanations for reinforcement fine-tuning. | Exact payloads, commands, or snippets shown in A JSONL formatted dataset containing user messages, compliance labels, and explanations for reinforcement fine-tuning. |
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-da.text | A text file containing JSONL formatted training data examples for reinforcement fine-tuning, including user messages, compliance status, and explanations. | Exact payloads, commands, or snippets shown in A text file containing JSONL formatted training data examples for reinforcement fine-tuning, including user messages,... |
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-js-2.text | A JSON schema definition for a security assistant used in reinforcement fine-tuning examples. | Exact payloads, commands, or snippets shown in A JSON schema definition for a security assistant used in reinforcement fine-tuning examples. |
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-py.text | A Python code snippet demonstrating how to use tostrictjsonschema from openai.lib.pydantic to generate a compatible JSON schema for reinforcement fine-tuning. | Exact payloads, commands, or snippets shown in A Python code snippet demonstrating how to use tostrictjsonschema from openai.lib.pydantic to generate a compatible J... |
examples/reinforcement-fine-tuning-openai-api-openai-api-reinforcement-finetuning.text | A curl command demonstrating how to create a reinforcement fine-tuning job via the OpenAI API. | Exact payloads, commands, or snippets shown in A curl command demonstrating how to create a reinforcement fine-tuning job via the OpenAI API. |
examples/reinforcement-fine-tuning-openai-api-openai-api-reinforcement-fine-tunin.text | A curl command demonstrating how to send a request to the OpenAI API using a reinforcement fine-tuned model. | Exact payloads, commands, or snippets shown in A curl command demonstrating how to send a request to the OpenAI API using a reinforcement fine-tuned model. |
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-jo.text | A text representation of an OpenAI reinforcement fine-tuning job checkpoint object containing metrics and model checkpoint details. | Exact payloads, commands, or snippets shown in A text representation of an OpenAI reinforcement fine-tuning job checkpoint object containing metrics and model check... |
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-jo-2.text | A text log representing a JSON object of a reinforcement fine-tuning job event from the OpenAI API. | Exact payloads, commands, or snippets shown in A text log representing a JSON object of a reinforcement fine-tuning job event from the OpenAI API. |
What This Skill Covers
- Reinforcement fine-tuning (RFT) adapts an OpenAI reasoning model with a feedback signal you define. Like supervised fine-tuning, it tailors the model to your...
- Main sections:
Example: LLM-powered security review,Define a grader,Grading Criteria:,Copernicus Product Security Policy,Introduction.
Workflow
- Open the most relevant file under
docs/for the exact documented workflow and wording. - Open
schemas/files for exact structured contracts. - Open
examples/files for concrete requests, commands, snippets, and manifests. - Do not add behavior or configuration that is not present in the attached source files.
Canonical source: https://developers.openai.com/api/docs/guides/reinforcement-fine-tuning
