openai · OpenAI Platform Docs
Evaluate agent workflows | OpenAI API
Teaches methodologies and implementation patterns for evaluating the performance, reliability, and accuracy of autonomous agent workflows.
Derived skill
Files assembled from official documentation
Viewing SKILL.md
Evaluate agent workflows | OpenAI API
Teaches methodologies and implementation patterns for evaluating the performance, reliability, and accuracy of autonomous agent workflows.
When To Use
Use when you need to design a testing framework or implement metrics to measure the success and safety of an agentic workflow.
Reference Files
| File | Contains | Use For |
|---|---|---|
SKILL.md | Entry point: scope, routing table, and workflow. | Start here. |
docs/evaluate-agent-workflows-openai-api-workflow-guide.md | A guide outlining evaluation strategies for agent workflows including traces, trace-grading, datasets, and eval runs using the OpenAI API. | Questions about a guide outlining evaluation strategies for agent workflows including traces, trace-grading, datasets, and eval runs... |
What This Skill Covers
- The OpenAI Platform offers a suite of evaluation tools to help you ensure your agents perform consistently and accurately.
- Main sections:
Start with traces when you are still debugging behavior,Trace-grading workflow,Move to datasets and eval runs when you need repeatability,Related evaluation surfaces.
Workflow
- Open the most relevant file under
docs/for the exact documented workflow and wording. - Open
schemas/files for exact structured contracts. - Open
examples/files for concrete requests, commands, snippets, and manifests. - Do not add behavior or configuration that is not present in the attached source files.
Canonical source: https://developers.openai.com/api/docs/guides/agent-evals
