openai · OpenAI Platform Docs

Evaluation best practices | OpenAI API

Provides strategic guidance and methodologies for designing, implementing, and interpreting evaluation frameworks for LLM-based applications.

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Provides strategic guidance and methodologies for designing, implementing, and interpreting evaluation frameworks for LLM-based applications.

Use when designing an evaluation pipeline to measure model accuracy, reliability, or performance improvements in an LLM application.

File	Contains	Use For
`SKILL.md`	Entry point: scope, routing table, and workflow.	Start here.
`docs/evaluation-best-practices-openai-api-workflow-guide.md`	A guide outlining methodologies for testing generative AI systems, including types of evaluations and designing an evaluation process.	Questions about a guide outlining methodologies for testing generative AI systems, including types of evaluations and designing an ev...

Generative AI is variable. Models sometimes produce different output from the same input, which makes traditional software testing methods insufficient for A...
Main sections: What are evals?, Types of evals, How to read evals, Design your eval process, Example: Summarizing transcripts.

Open the most relevant file under docs/ for the exact documented workflow and wording.
Open schemas/ files for exact structured contracts.
Open examples/ files for concrete requests, commands, snippets, and manifests.
Do not add behavior or configuration that is not present in the attached source files.