openai · OpenAI Platform Docs
Working with evals | OpenAI API
Teaches how to implement and manage evaluation frameworks to measure the performance and accuracy of LLM outputs.
Derived skill
Files assembled from official documentation
Viewing SKILL.md
Working with evals | OpenAI API
Teaches how to implement and manage evaluation frameworks to measure the performance and accuracy of LLM outputs.
When To Use
Use when you need to implement a systematic process for testing, measuring, and improving the quality of model responses against specific benchmarks or criteria.
Reference Files
| File | Contains | Use For |
|---|---|---|
SKILL.md | Entry point: scope, routing table, and workflow. | Start here. |
docs/working-with-evals-openai-api-workflow-guide.md | A guide explaining how to create evaluations, test prompts, upload test data, and manage eval runs using the OpenAI API. | Questions about a guide explaining how to create evaluations, test prompts, upload test data, and manage eval runs using the OpenAI API. |
examples/working-with-evals-openai-api-openai-api-evals-curl-request.text | A curl command demonstrating how to send a request to the OpenAI API for evaluating model responses. | Exact payloads, commands, or snippets shown in A curl command demonstrating how to send a request to the OpenAI API for evaluating model responses. |
examples/working-with-evals-openai-api-openai-evals-nodejs-categorization.text | A Node.js code example demonstrating how to use the OpenAI client to perform evaluations by categorizing IT support tickets. | Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to use the OpenAI client to perform evaluations by categorizing IT support t... |
examples/working-with-evals-openai-api-openai-evals-python-categorization-prompt.text | A Python code snippet demonstrating how to structure instructions and input data for an evaluation task using the OpenAI client. | Exact payloads, commands, or snippets shown in A Python code snippet demonstrating how to structure instructions and input data for an evaluation task using the Ope... |
examples/working-with-evals-openai-api-openai-api-evals-chat-completions-curl-req.text | A curl command demonstrating a chat completions request used for categorizing support tickets as part of an evaluation process. | Exact payloads, commands, or snippets shown in A curl command demonstrating a chat completions request used for categorizing support tickets as part of an evaluatio... |
examples/working-with-evals-openai-api-openai-evals-nodejs-categorization-2.text | A Node.js code example demonstrating how to use the OpenAI client to perform evaluations by categorizing IT support tickets. | Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to use the OpenAI client to perform evaluations by categorizing IT support t... |
examples/working-with-evals-openai-api-openai-evals-python-categorization-prompt-2.text | A Python code snippet demonstrating how to structure instructions and input data for an evaluation task using the OpenAI client. | Exact payloads, commands, or snippets shown in A Python code snippet demonstrating how to structure instructions and input data for an evaluation task using the Ope... |
examples/working-with-evals-openai-api-openai-api-evals-create-curl-request.text | A curl command demonstrating how to create a new evaluation using the OpenAI API with a custom datasource configuration. | Exact payloads, commands, or snippets shown in A curl command demonstrating how to create a new evaluation using the OpenAI API with a custom datasource configuration. |
examples/working-with-evals-openai-api-openai-evals-create-nodejs.text | A Node.js code example demonstrating how to create an evaluation object using the OpenAI SDK with a custom datasource configuration. | Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to create an evaluation object using the OpenAI SDK with a custom datasource... |
examples/working-with-evals-openai-api-openai-evals-create-python-custom-datasour.text | A Python code example demonstrating how to create an evaluation object using a custom datasource schema via the OpenAI client. | Exact payloads, commands, or snippets shown in A Python code example demonstrating how to create an evaluation object using a custom datasource schema via the OpenA... |
examples/working-with-evals-openai-api-openai-evals-custom-item.text | A text representation of a custom JSON schema used for defining item properties within an OpenAI evaluation. | Exact payloads, commands, or snippets shown in A text representation of a custom JSON schema used for defining item properties within an OpenAI evaluation. |
examples/working-with-evals-openai-api-openai-evals-stringcheck.text | A text-based example of a stringcheck evaluation configuration using category string matching. | Exact payloads, commands, or snippets shown in A text-based example of a stringcheck evaluation configuration using category string matching. |
examples/working-with-evals-openai-api-openai-evals-json-object.text | A JSON representation of an evaluation object including datasource configuration and testing criteria. | Exact payloads, commands, or snippets shown in A JSON representation of an evaluation object including datasource configuration and testing criteria. |
examples/working-with-evals-openai-api-openai-evals-dataset-jsonl-format.text | A JSONL formatted dataset containing ticket text and correct labels for evaluation purposes. | Exact payloads, commands, or snippets shown in A JSONL formatted dataset containing ticket text and correct labels for evaluation purposes. |
examples/working-with-evals-openai-api-openai-api-evals-upload-curl.text | A curl command demonstrating how to upload a JSONL file with the purpose set to evals using the OpenAI Files API. | Exact payloads, commands, or snippets shown in A curl command demonstrating how to upload a JSONL file with the purpose set to evals using the OpenAI Files API. |
examples/working-with-evals-openai-api-openai-api-nodejs-evals-upload.text | A Node.js code snippet demonstrating how to upload a JSONL file to the OpenAI API for use with evals. | Exact payloads, commands, or snippets shown in A Node.js code snippet demonstrating how to upload a JSONL file to the OpenAI API for use with evals. |
examples/working-with-evals-openai-api-openai-api-python-upload-evals.text | A Python script demonstrating how to use the OpenAI client to upload a JSONL file for evaluation purposes. | Exact payloads, commands, or snippets shown in A Python script demonstrating how to use the OpenAI client to upload a JSONL file for evaluation purposes. |
examples/working-with-evals-openai-api-openai-evals-tickets-jsonl.text | A JSONL formatted file containing ticket data used as an input for OpenAI evaluation tasks. | Exact payloads, commands, or snippets shown in A JSONL formatted file containing ticket data used as an input for OpenAI evaluation tasks. |
examples/working-with-evals-openai-api-openai-api-evals-run-curl-request.text | A curl command demonstrating how to create a new evaluation run using the OpenAI API. | Exact payloads, commands, or snippets shown in A curl command demonstrating how to create a new evaluation run using the OpenAI API. |
examples/working-with-evals-openai-api-openai-evals-runs-create-nodejs.text | A Node.js code example demonstrating how to create an evaluation run using the OpenAI SDK. | Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to create an evaluation run using the OpenAI SDK. |
examples/working-with-evals-openai-api-openai-evals-runs-create-python.text | A Python code example demonstrating how to create an evaluation run using the OpenAI client. | Exact payloads, commands, or snippets shown in A Python code example demonstrating how to create an evaluation run using the OpenAI client. |
examples/working-with-evals-openai-api-openai-api-evals-run-curl-request-2.text | A curl command demonstrating how to create a new evaluation run using the OpenAI API. | Exact payloads, commands, or snippets shown in A curl command demonstrating how to create a new evaluation run using the OpenAI API. |
examples/working-with-evals-openai-api-openai-evals-runs-create-nodejs-2.text | A Node.js code example demonstrating how to create an evaluation run using the OpenAI SDK. | Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to create an evaluation run using the OpenAI SDK. |
examples/working-with-evals-openai-api-openai-evals-runs-create-python-2.text | A Python code example demonstrating how to create an evaluation run using the OpenAI client. | Exact payloads, commands, or snippets shown in A Python code example demonstrating how to create an evaluation run using the OpenAI client. |
examples/working-with-evals-openai-api-openai-eval-run-object-response.text | A JSON-formatted representation of an eval run object containing the run ID, evaluation ID, report URL, and status. | Exact payloads, commands, or snippets shown in A JSON-formatted representation of an eval run object containing the run ID, evaluation ID, report URL, and status. |
examples/working-with-evals-openai-api-openai-eval-run-response-object.text | A JSON representation of an eval run object containing the run ID, evaluation ID, report URL, and status. | Exact payloads, commands, or snippets shown in A JSON representation of an eval run object containing the run ID, evaluation ID, report URL, and status. |
examples/working-with-evals-openai-api-openai-api-evals-run-retrieval-curl.text | A curl command demonstrating how to retrieve the status and results of a specific evaluation run using the OpenAI API. | Exact payloads, commands, or snippets shown in A curl command demonstrating how to retrieve the status and results of a specific evaluation run using the OpenAI API. |
examples/working-with-evals-openai-api-openai-api-nodejs-retrieve-eval-run.text | A JavaScript code snippet using the OpenAI SDK to retrieve the details of a specific evaluation run. | Exact payloads, commands, or snippets shown in A JavaScript code snippet using the OpenAI SDK to retrieve the details of a specific evaluation run. |
examples/working-with-evals-openai-api-openai-api-python-retrieve-eval-run.text | A Python code snippet demonstrating how to use the OpenAI client to retrieve the status and details of a specific evaluation run. | Exact payloads, commands, or snippets shown in A Python code snippet demonstrating how to use the OpenAI client to retrieve the status and details of a specific eva... |
examples/working-with-evals-openai-api-openai-api-eval-run-object-response.text | A JSON representation of an eval.run object containing the evaluation run ID, evaluation ID, and report URL. | Exact payloads, commands, or snippets shown in A JSON representation of an eval.run object containing the evaluation run ID, evaluation ID, and report URL. |
examples/working-with-evals-openai-api-openai-eval-run-object-response-2.text | A JSON representation of an eval.run object containing the evaluation run ID, evaluation ID, and report URL. | Exact payloads, commands, or snippets shown in A JSON representation of an eval.run object containing the evaluation run ID, evaluation ID, and report URL. |
What This Skill Covers
- Evaluations (often called evals ) test model outputs to ensure they meet style and content criteria that you specify. Writing evals to understand how your LL...
- Main sections:
Create an eval for a task,Test a prompt with your eval,Uploading test data,Creating an eval run,Analyze the results.
Workflow
- Open the most relevant file under
docs/for the exact documented workflow and wording. - Open
schemas/files for exact structured contracts. - Open
examples/files for concrete requests, commands, snippets, and manifests. - Do not add behavior or configuration that is not present in the attached source files.
Canonical source: https://developers.openai.com/api/docs/guides/evals
