openai · OpenAI Platform Docs
Evaluation best practices
A guide on designing and implementing custom evaluation frameworks for LLM applications, covering objective definition, dataset collection, metric selection, and the integration of human feedback to mitigate non-deter...
Derived skill
Files assembled from official documentation
Viewing SKILL.md
Evaluation best practices
A guide on designing and implementing custom evaluation frameworks for LLM applications, covering objective definition, dataset collection, metric selection, and the integration of human feedback to mitigate non-deter...
When To Use
Use when you need to design a structured testing process to measure the accuracy, reliability, and performance of a generative AI system using custom metrics and datasets.
Reference Files
| File | Contains | Use For |
|---|---|---|
SKILL.md | Entry point: scope, routing table, and workflow. | Start here. |
docs/evaluation-best-practices-workflow-guide.md | A guide outlining methodologies for designing and implementing evaluation processes for generative AI systems. | Questions about a guide outlining methodologies for designing and implementing evaluation processes for generative AI systems. |
What This Skill Covers
- Generative AI is variable. Models sometimes produce different output from the same input, which makes traditional software testing methods insufficient for A...
- Main sections:
What are evals?,Types of evals,How to read evals,Design your eval process,Example: Summarizing transcripts.
Workflow
- Open the most relevant file under
docs/for the exact documented workflow and wording. - Open
schemas/files for exact structured contracts. - Open
examples/files for concrete requests, commands, snippets, and manifests. - Do not add behavior or configuration that is not present in the attached source files.
Canonical source: https://developers.openai.com/api/docs/guides/evaluation-best-practices.md
