Prompt Buddy logoPrompt Buddy

openai · OpenAI Platform Docs

Working with evals | OpenAI API

Teaches how to implement and manage evaluation frameworks to measure the performance and accuracy of LLM outputs.

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Working with evals | OpenAI API

Teaches how to implement and manage evaluation frameworks to measure the performance and accuracy of LLM outputs.

When To Use

Use when you need to implement a systematic process for testing, measuring, and improving the quality of model responses against specific benchmarks or criteria.

Reference Files

FileContainsUse For
SKILL.mdEntry point: scope, routing table, and workflow.Start here.
docs/working-with-evals-openai-api-workflow-guide.mdA guide explaining how to create evaluations, test prompts, upload test data, and manage eval runs using the OpenAI API.Questions about a guide explaining how to create evaluations, test prompts, upload test data, and manage eval runs using the OpenAI API.
examples/working-with-evals-openai-api-openai-api-evals-curl-request.textA curl command demonstrating how to send a request to the OpenAI API for evaluating model responses.Exact payloads, commands, or snippets shown in A curl command demonstrating how to send a request to the OpenAI API for evaluating model responses.
examples/working-with-evals-openai-api-openai-evals-nodejs-categorization.textA Node.js code example demonstrating how to use the OpenAI client to perform evaluations by categorizing IT support tickets.Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to use the OpenAI client to perform evaluations by categorizing IT support t...
examples/working-with-evals-openai-api-openai-evals-python-categorization-prompt.textA Python code snippet demonstrating how to structure instructions and input data for an evaluation task using the OpenAI client.Exact payloads, commands, or snippets shown in A Python code snippet demonstrating how to structure instructions and input data for an evaluation task using the Ope...
examples/working-with-evals-openai-api-openai-api-evals-chat-completions-curl-req.textA curl command demonstrating a chat completions request used for categorizing support tickets as part of an evaluation process.Exact payloads, commands, or snippets shown in A curl command demonstrating a chat completions request used for categorizing support tickets as part of an evaluatio...
examples/working-with-evals-openai-api-openai-evals-nodejs-categorization-2.textA Node.js code example demonstrating how to use the OpenAI client to perform evaluations by categorizing IT support tickets.Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to use the OpenAI client to perform evaluations by categorizing IT support t...
examples/working-with-evals-openai-api-openai-evals-python-categorization-prompt-2.textA Python code snippet demonstrating how to structure instructions and input data for an evaluation task using the OpenAI client.Exact payloads, commands, or snippets shown in A Python code snippet demonstrating how to structure instructions and input data for an evaluation task using the Ope...
examples/working-with-evals-openai-api-openai-api-evals-create-curl-request.textA curl command demonstrating how to create a new evaluation using the OpenAI API with a custom datasource configuration.Exact payloads, commands, or snippets shown in A curl command demonstrating how to create a new evaluation using the OpenAI API with a custom datasource configuration.
examples/working-with-evals-openai-api-openai-evals-create-nodejs.textA Node.js code example demonstrating how to create an evaluation object using the OpenAI SDK with a custom datasource configuration.Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to create an evaluation object using the OpenAI SDK with a custom datasource...
examples/working-with-evals-openai-api-openai-evals-create-python-custom-datasour.textA Python code example demonstrating how to create an evaluation object using a custom datasource schema via the OpenAI client.Exact payloads, commands, or snippets shown in A Python code example demonstrating how to create an evaluation object using a custom datasource schema via the OpenA...
examples/working-with-evals-openai-api-openai-evals-custom-item.textA text representation of a custom JSON schema used for defining item properties within an OpenAI evaluation.Exact payloads, commands, or snippets shown in A text representation of a custom JSON schema used for defining item properties within an OpenAI evaluation.
examples/working-with-evals-openai-api-openai-evals-stringcheck.textA text-based example of a stringcheck evaluation configuration using category string matching.Exact payloads, commands, or snippets shown in A text-based example of a stringcheck evaluation configuration using category string matching.
examples/working-with-evals-openai-api-openai-evals-json-object.textA JSON representation of an evaluation object including datasource configuration and testing criteria.Exact payloads, commands, or snippets shown in A JSON representation of an evaluation object including datasource configuration and testing criteria.
examples/working-with-evals-openai-api-openai-evals-dataset-jsonl-format.textA JSONL formatted dataset containing ticket text and correct labels for evaluation purposes.Exact payloads, commands, or snippets shown in A JSONL formatted dataset containing ticket text and correct labels for evaluation purposes.
examples/working-with-evals-openai-api-openai-api-evals-upload-curl.textA curl command demonstrating how to upload a JSONL file with the purpose set to evals using the OpenAI Files API.Exact payloads, commands, or snippets shown in A curl command demonstrating how to upload a JSONL file with the purpose set to evals using the OpenAI Files API.
examples/working-with-evals-openai-api-openai-api-nodejs-evals-upload.textA Node.js code snippet demonstrating how to upload a JSONL file to the OpenAI API for use with evals.Exact payloads, commands, or snippets shown in A Node.js code snippet demonstrating how to upload a JSONL file to the OpenAI API for use with evals.
examples/working-with-evals-openai-api-openai-api-python-upload-evals.textA Python script demonstrating how to use the OpenAI client to upload a JSONL file for evaluation purposes.Exact payloads, commands, or snippets shown in A Python script demonstrating how to use the OpenAI client to upload a JSONL file for evaluation purposes.
examples/working-with-evals-openai-api-openai-evals-tickets-jsonl.textA JSONL formatted file containing ticket data used as an input for OpenAI evaluation tasks.Exact payloads, commands, or snippets shown in A JSONL formatted file containing ticket data used as an input for OpenAI evaluation tasks.
examples/working-with-evals-openai-api-openai-api-evals-run-curl-request.textA curl command demonstrating how to create a new evaluation run using the OpenAI API.Exact payloads, commands, or snippets shown in A curl command demonstrating how to create a new evaluation run using the OpenAI API.
examples/working-with-evals-openai-api-openai-evals-runs-create-nodejs.textA Node.js code example demonstrating how to create an evaluation run using the OpenAI SDK.Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to create an evaluation run using the OpenAI SDK.
examples/working-with-evals-openai-api-openai-evals-runs-create-python.textA Python code example demonstrating how to create an evaluation run using the OpenAI client.Exact payloads, commands, or snippets shown in A Python code example demonstrating how to create an evaluation run using the OpenAI client.
examples/working-with-evals-openai-api-openai-api-evals-run-curl-request-2.textA curl command demonstrating how to create a new evaluation run using the OpenAI API.Exact payloads, commands, or snippets shown in A curl command demonstrating how to create a new evaluation run using the OpenAI API.
examples/working-with-evals-openai-api-openai-evals-runs-create-nodejs-2.textA Node.js code example demonstrating how to create an evaluation run using the OpenAI SDK.Exact payloads, commands, or snippets shown in A Node.js code example demonstrating how to create an evaluation run using the OpenAI SDK.
examples/working-with-evals-openai-api-openai-evals-runs-create-python-2.textA Python code example demonstrating how to create an evaluation run using the OpenAI client.Exact payloads, commands, or snippets shown in A Python code example demonstrating how to create an evaluation run using the OpenAI client.
examples/working-with-evals-openai-api-openai-eval-run-object-response.textA JSON-formatted representation of an eval run object containing the run ID, evaluation ID, report URL, and status.Exact payloads, commands, or snippets shown in A JSON-formatted representation of an eval run object containing the run ID, evaluation ID, report URL, and status.
examples/working-with-evals-openai-api-openai-eval-run-response-object.textA JSON representation of an eval run object containing the run ID, evaluation ID, report URL, and status.Exact payloads, commands, or snippets shown in A JSON representation of an eval run object containing the run ID, evaluation ID, report URL, and status.
examples/working-with-evals-openai-api-openai-api-evals-run-retrieval-curl.textA curl command demonstrating how to retrieve the status and results of a specific evaluation run using the OpenAI API.Exact payloads, commands, or snippets shown in A curl command demonstrating how to retrieve the status and results of a specific evaluation run using the OpenAI API.
examples/working-with-evals-openai-api-openai-api-nodejs-retrieve-eval-run.textA JavaScript code snippet using the OpenAI SDK to retrieve the details of a specific evaluation run.Exact payloads, commands, or snippets shown in A JavaScript code snippet using the OpenAI SDK to retrieve the details of a specific evaluation run.
examples/working-with-evals-openai-api-openai-api-python-retrieve-eval-run.textA Python code snippet demonstrating how to use the OpenAI client to retrieve the status and details of a specific evaluation run.Exact payloads, commands, or snippets shown in A Python code snippet demonstrating how to use the OpenAI client to retrieve the status and details of a specific eva...
examples/working-with-evals-openai-api-openai-api-eval-run-object-response.textA JSON representation of an eval.run object containing the evaluation run ID, evaluation ID, and report URL.Exact payloads, commands, or snippets shown in A JSON representation of an eval.run object containing the evaluation run ID, evaluation ID, and report URL.
examples/working-with-evals-openai-api-openai-eval-run-object-response-2.textA JSON representation of an eval.run object containing the evaluation run ID, evaluation ID, and report URL.Exact payloads, commands, or snippets shown in A JSON representation of an eval.run object containing the evaluation run ID, evaluation ID, and report URL.

What This Skill Covers

  • Evaluations (often called evals ) test model outputs to ensure they meet style and content criteria that you specify. Writing evals to understand how your LL...
  • Main sections: Create an eval for a task, Test a prompt with your eval, Uploading test data, Creating an eval run, Analyze the results.

Workflow

  1. Open the most relevant file under docs/ for the exact documented workflow and wording.
  2. Open schemas/ files for exact structured contracts.
  3. Open examples/ files for concrete requests, commands, snippets, and manifests.
  4. Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://developers.openai.com/api/docs/guides/evals