openai · OpenAI Platform Docs

Evaluate agent workflows

A guide on selecting and implementing evaluation strategies for agent workflows, covering trace-grading for debugging and datasets/eval runs for repeatable benchmarking.

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Evaluate agent workflows

A guide on selecting and implementing evaluation strategies for agent workflows, covering trace-grading for debugging and datasets/eval runs for repeatable benchmarking.

When To Use

Use when you need to debug agent tool selection, verify handoff logic, or benchmark prompt changes using repeatable datasets and evaluation runs.

Reference Files

File	Contains	Use For
`SKILL.md`	Entry point: scope, routing table, and workflow.	Start here.
`docs/evaluate-agent-workflows-workflow-guide.md`	A guide detailing evaluation surfaces and workflows for agent performance, including traces, trace-grading, and dataset-based eval runs.	Questions about a guide detailing evaluation surfaces and workflows for agent performance, including traces, trace-grading, and datas...

What This Skill Covers

The OpenAI Platform offers a suite of evaluation tools to help you ensure your agents perform consistently and accurately.
Main sections: Start with traces when you are still debugging behavior, Trace-grading workflow, Move to datasets and eval runs when you need repeatability, Related evaluation surfaces.

Workflow

Open the most relevant file under docs/ for the exact documented workflow and wording.
Open schemas/ files for exact structured contracts.
Open examples/ files for concrete requests, commands, snippets, and manifests.
Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://developers.openai.com/api/docs/guides/agent-evals.md