openai · OpenAI Platform Docs

Working with evals

A guide to programmatically configuring, running, and analyzing model evaluations using the Evals API to test LLM outputs against specific style and content criteria.

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Working with evals

A guide to programmatically configuring, running, and analyzing model evaluations using the Evals API to test LLM outputs against specific style and content criteria.

When To Use

Use when you need to programmatically validate that LLM outputs meet specific quality standards or when comparing model performance during upgrades using test datasets and grading criteria.

Reference Files

What This Skill Covers

Evaluations (often called evals) test model outputs to ensure they meet style and content criteria that you specify. Writing evals to understand how your LLM...
Main sections: Create an eval for a task, Test a prompt with your eval, Uploading test data, Creating an eval run, Analyze the results.

Workflow

Open the most relevant file under docs/ for the exact documented workflow and wording.
Open schemas/ files for exact structured contracts.
Open examples/ files for concrete requests, commands, snippets, and manifests.
Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://developers.openai.com/api/docs/guides/evals.md

Skill metadata

Name: Working with evals
Author: Bruno HANSS - Prompt Buddy
Generation mode: Ai Assisted Human Authored
Source count: 1

Provenance

Source program: OpenAI Platform Docs
Last generated: May 11, 2026
Last source sync: Unknown
Source pages: 1

Safety model

Canonical source pages are preserved separately. Derived files record source evidence and require zero AI-generated facts.

File tree

Source links

https://developers.openai.com/api/docs/guides/evals.md Back to skills

File	Contains	Use For
`SKILL.md`	Entry point: scope, routing table, and workflow.	Start here.
`docs/working-with-evals-workflow-guide.md`	A guide explaining how to create evaluations, test prompts, upload test data, and manage eval runs to ensure model output quality.	Questions about a guide explaining how to create evaluations, test prompts, upload test data, and manage eval runs to ensure model ou...
`examples/working-with-evals-openai-evals-curl-request.bash`	A bash curl command demonstrating how to send a request to the OpenAI API for evaluation purposes.	Exact payloads, commands, or snippets shown in A bash curl command demonstrating how to send a request to the OpenAI API for evaluation purposes.
`examples/working-with-evals-openai-evals-javascript-categorization.javascript`	A JavaScript code example demonstrating how to use the OpenAI client to perform categorization tasks as part of an evaluation workflow.	Exact payloads, commands, or snippets shown in A JavaScript code example demonstrating how to use the OpenAI client to perform categorization tasks as part of an ev...
`examples/working-with-evals-openai-evals-python-categorization.python`	A Python script demonstrating how to use the OpenAI client to categorize IT support tickets as part of an evaluation workflow.	Exact payloads, commands, or snippets shown in A Python script demonstrating how to use the OpenAI client to categorize IT support tickets as part of an evaluation...
`examples/working-with-evals-openai-evals-api-curl-request.bash`	A bash curl command demonstrating how to create a new evaluation via the OpenAI API with a custom datasource configuration.	Exact payloads, commands, or snippets shown in A bash curl command demonstrating how to create a new evaluation via the OpenAI API with a custom datasource configur...
`examples/working-with-evals-openai-evals-create.javascript`	A JavaScript code example demonstrating how to use the OpenAI SDK to create an evaluation object with custom datasource configuration.	Exact payloads, commands, or snippets shown in A JavaScript code example demonstrating how to use the OpenAI SDK to create an evaluation object with custom datasour...
`examples/working-with-evals-openai-evals-create.python`	A Python script demonstrating how to use the OpenAI client to create a custom evaluation with a datasource configuration and item schema.	Exact payloads, commands, or snippets shown in A Python script demonstrating how to use the OpenAI client to create a custom evaluation with a datasource configurat...
`examples/working-with-evals-openai-evals-custom-eval.json`	A JSON object defining a custom evaluation schema including item properties and required fields for an evaluation task.	Exact payloads, commands, or snippets shown in A JSON object defining a custom evaluation schema including item properties and required fields for an evaluation task.
`examples/working-with-evals-openai-evals.json`	A JSON object defining an evaluation schema with string match operations and input/reference mappings.	Exact payloads, commands, or snippets shown in A JSON object defining an evaluation schema with string match operations and input/reference mappings.
`examples/working-with-evals-openai-evals-custom-datasource.json`	A JSON configuration object defining a custom datasource and string-check testing criteria for an evaluation.	Exact payloads, commands, or snippets shown in A JSON configuration object defining a custom datasource and string-check testing criteria for an evaluation.
`examples/working-with-evals-openai-evals-dataset-item-examples.json`	A JSON array of evaluation dataset items containing ticket text and their corresponding correct labels.	Exact payloads, commands, or snippets shown in A JSON array of evaluation dataset items containing ticket text and their corresponding correct labels.
`examples/working-with-evals-openai-evals-upload-curl.bash`	A bash command using curl to upload a JSONL file for evaluation purposes to the OpenAI Files API.	Exact payloads, commands, or snippets shown in A bash command using curl to upload a JSONL file for evaluation purposes to the OpenAI Files API.
`examples/working-with-evals-openai-evals-javascript-upload.javascript`	A JavaScript code example demonstrating how to upload a JSONL file to OpenAI for use with the evals API.	Exact payloads, commands, or snippets shown in A JavaScript code example demonstrating how to upload a JSONL file to OpenAI for use with the evals API.
`examples/working-with-evals-openai-evals-python-upload.python`	A Python script demonstrating how to upload a JSONL file to OpenAI for use with the evals API.	Exact payloads, commands, or snippets shown in A Python script demonstrating how to upload a JSONL file to OpenAI for use with the evals API.
`examples/working-with-evals-openai-evals-tickets-dataset.json`	A JSONL formatted dataset containing ticket objects used for evaluating model performance.	Exact payloads, commands, or snippets shown in A JSONL formatted dataset containing ticket objects used for evaluating model performance.
`examples/working-with-evals-openai-evals-run-creation-curl.bash`	A curl command demonstrating how to create a new run for an existing evaluation using the OpenAI API.	Exact payloads, commands, or snippets shown in A curl command demonstrating how to create a new run for an existing evaluation using the OpenAI API.
`examples/working-with-evals-openai-evals-runs-create.javascript`	A JavaScript code example demonstrating how to create an evaluation run using the OpenAI SDK.	Exact payloads, commands, or snippets shown in A JavaScript code example demonstrating how to create an evaluation run using the OpenAI SDK.
`examples/working-with-evals-openai-evals-runs-create.python`	A Python script demonstrating how to create an evaluation run using the OpenAI client to categorize IT support tickets.	Exact payloads, commands, or snippets shown in A Python script demonstrating how to create an evaluation run using the OpenAI client to categorize IT support tickets.
`examples/working-with-evals-openai-evals-run-response-object.json`	A JSON object representing the response structure of an evaluation run, including run ID, status, and report URL.	Exact payloads, commands, or snippets shown in A JSON object representing the response structure of an evaluation run, including run ID, status, and report URL.
`examples/working-with-evals-openai-evals-run-status-curl-request.bash`	A curl command to retrieve the status of a specific evaluation run using the OpenAI API.	Exact payloads, commands, or snippets shown in A curl command to retrieve the status of a specific evaluation run using the OpenAI API.
`examples/working-with-evals-openai-evals-runs-retrieve.javascript`	A JavaScript code example demonstrating how to retrieve an evaluation run using the OpenAI client library.	Exact payloads, commands, or snippets shown in A JavaScript code example demonstrating how to retrieve an evaluation run using the OpenAI client library.
`examples/working-with-evals-openai-evals-runs-retrieve.python`	A Python script demonstrating how to use the OpenAI client to retrieve the status and details of an evaluation run.	Exact payloads, commands, or snippets shown in A Python script demonstrating how to use the OpenAI client to retrieve the status and details of an evaluation run.
`examples/working-with-evals-openai-evals-run-result-response.json`	A JSON object representing the completed status and result counts of an evaluation run.	Exact payloads, commands, or snippets shown in A JSON object representing the completed status and result counts of an evaluation run.