Prompt Buddy logoPrompt Buddy

openai · OpenAI Platform Docs

Reinforcement fine-tuning | OpenAI API

Explains the process and methodology for performing reinforcement fine-tuning to optimize model performance based on specific feedback or reward signals.

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Reinforcement fine-tuning | OpenAI API

Explains the process and methodology for performing reinforcement fine-tuning to optimize model performance based on specific feedback or reward signals.

When To Use

Use when you need to implement a reinforcement learning workflow to fine-tune a model using preference data or reward models.

Reference Files

FileContainsUse For
SKILL.mdEntry point: scope, routing table, and workflow.Start here.
docs/reinforcement-fine-tuning-openai-api-workflow-guide.mdA guide explaining how to adapt OpenAI reasoning models using reinforcement fine-tuning with custom feedback signals and grading criteria.Questions about a guide explaining how to adapt OpenAI reasoning models using reinforcement fine-tuning with custom feedback signals...
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-gu.textA text-based guide explaining the concepts and implementation of reinforcement fine-tuning using the OpenAI API.Exact payloads, commands, or snippets shown in A text-based guide explaining the concepts and implementation of reinforcement fine-tuning using the OpenAI API.
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-tr.textA text file demonstrating the structured training data format for reinforcement fine-tuning including compliant and explanation fields.Exact payloads, commands, or snippets shown in A text file demonstrating the structured training data format for reinforcement fine-tuning including compliant and e...
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-gr.textA text configuration defining a multi-type grader with a gpt-4o score model for reinforcement fine-tuning evaluation.Exact payloads, commands, or snippets shown in A text configuration defining a multi-type grader with a gpt-4o score model for reinforcement fine-tuning evaluation.
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-ev.textA text-based list of evaluation criteria used to assess the accuracy of model-generated answers during reinforcement fine-tuning.Exact payloads, commands, or snippets shown in A text-based list of evaluation criteria used to assess the accuracy of model-generated answers during reinforcement...
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-me.textA text representation of the messages format used for reinforcement fine-tuning training data.Exact payloads, commands, or snippets shown in A text representation of the messages format used for reinforcement fine-tuning training data.
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-js.textA JSONL formatted dataset containing user messages, compliance labels, and explanations for reinforcement fine-tuning.Exact payloads, commands, or snippets shown in A JSONL formatted dataset containing user messages, compliance labels, and explanations for reinforcement fine-tuning.
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-da.textA text file containing JSONL formatted training data examples for reinforcement fine-tuning, including user messages, compliance status, and explanations.Exact payloads, commands, or snippets shown in A text file containing JSONL formatted training data examples for reinforcement fine-tuning, including user messages,...
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-js-2.textA JSON schema definition for a security assistant used in reinforcement fine-tuning examples.Exact payloads, commands, or snippets shown in A JSON schema definition for a security assistant used in reinforcement fine-tuning examples.
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-py.textA Python code snippet demonstrating how to use tostrictjsonschema from openai.lib.pydantic to generate a compatible JSON schema for reinforcement fine-tuning.Exact payloads, commands, or snippets shown in A Python code snippet demonstrating how to use tostrictjsonschema from openai.lib.pydantic to generate a compatible J...
examples/reinforcement-fine-tuning-openai-api-openai-api-reinforcement-finetuning.textA curl command demonstrating how to create a reinforcement fine-tuning job via the OpenAI API.Exact payloads, commands, or snippets shown in A curl command demonstrating how to create a reinforcement fine-tuning job via the OpenAI API.
examples/reinforcement-fine-tuning-openai-api-openai-api-reinforcement-fine-tunin.textA curl command demonstrating how to send a request to the OpenAI API using a reinforcement fine-tuned model.Exact payloads, commands, or snippets shown in A curl command demonstrating how to send a request to the OpenAI API using a reinforcement fine-tuned model.
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-jo.textA text representation of an OpenAI reinforcement fine-tuning job checkpoint object containing metrics and model checkpoint details.Exact payloads, commands, or snippets shown in A text representation of an OpenAI reinforcement fine-tuning job checkpoint object containing metrics and model check...
examples/reinforcement-fine-tuning-openai-api-openai-reinforcement-fine-tuning-jo-2.textA text log representing a JSON object of a reinforcement fine-tuning job event from the OpenAI API.Exact payloads, commands, or snippets shown in A text log representing a JSON object of a reinforcement fine-tuning job event from the OpenAI API.

What This Skill Covers

  • Reinforcement fine-tuning (RFT) adapts an OpenAI reasoning model with a feedback signal you define. Like supervised fine-tuning, it tailors the model to your...
  • Main sections: Example: LLM-powered security review, Define a grader, Grading Criteria:, Copernicus Product Security Policy, Introduction.

Workflow

  1. Open the most relevant file under docs/ for the exact documented workflow and wording.
  2. Open schemas/ files for exact structured contracts.
  3. Open examples/ files for concrete requests, commands, snippets, and manifests.
  4. Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://developers.openai.com/api/docs/guides/reinforcement-fine-tuning