openai · OpenAI Platform Docs
Evaluate agent workflows
A guide on selecting and implementing evaluation strategies for agent workflows, covering trace-grading for debugging and datasets/eval runs for repeatable benchmarking.
Derived skill
Files assembled from official documentation
Viewing SKILL.md
Evaluate agent workflows
A guide on selecting and implementing evaluation strategies for agent workflows, covering trace-grading for debugging and datasets/eval runs for repeatable benchmarking.
When To Use
Use when you need to debug agent tool selection, verify handoff logic, or benchmark prompt changes using repeatable datasets and evaluation runs.
Reference Files
| File | Contains | Use For |
|---|---|---|
SKILL.md | Entry point: scope, routing table, and workflow. | Start here. |
docs/evaluate-agent-workflows-workflow-guide.md | A guide detailing evaluation surfaces and workflows for agent performance, including traces, trace-grading, and dataset-based eval runs. | Questions about a guide detailing evaluation surfaces and workflows for agent performance, including traces, trace-grading, and datas... |
What This Skill Covers
- The OpenAI Platform offers a suite of evaluation tools to help you ensure your agents perform consistently and accurately.
- Main sections:
Start with traces when you are still debugging behavior,Trace-grading workflow,Move to datasets and eval runs when you need repeatability,Related evaluation surfaces.
Workflow
- Open the most relevant file under
docs/for the exact documented workflow and wording. - Open
schemas/files for exact structured contracts. - Open
examples/files for concrete requests, commands, snippets, and manifests. - Do not add behavior or configuration that is not present in the attached source files.
Canonical source: https://developers.openai.com/api/docs/guides/agent-evals.md
