openai · OpenAI Platform Docs

Evaluate agent workflows | OpenAI API

Teaches methodologies and implementation patterns for evaluating the performance, reliability, and accuracy of autonomous agent workflows.

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Teaches methodologies and implementation patterns for evaluating the performance, reliability, and accuracy of autonomous agent workflows.

Use when you need to design a testing framework or implement metrics to measure the success and safety of an agentic workflow.

File	Contains	Use For
`SKILL.md`	Entry point: scope, routing table, and workflow.	Start here.
`docs/evaluate-agent-workflows-openai-api-workflow-guide.md`	A guide outlining evaluation strategies for agent workflows including traces, trace-grading, datasets, and eval runs using the OpenAI API.	Questions about a guide outlining evaluation strategies for agent workflows including traces, trace-grading, datasets, and eval runs...

The OpenAI Platform offers a suite of evaluation tools to help you ensure your agents perform consistently and accurately.
Main sections: Start with traces when you are still debugging behavior, Trace-grading workflow, Move to datasets and eval runs when you need repeatability, Related evaluation surfaces.

Open the most relevant file under docs/ for the exact documented workflow and wording.
Open schemas/ files for exact structured contracts.
Open examples/ files for concrete requests, commands, snippets, and manifests.
Do not add behavior or configuration that is not present in the attached source files.