openai · OpenAI Platform Docs

Evaluation best practices

A guide on designing and implementing custom evaluation frameworks for LLM applications, covering objective definition, dataset collection, metric selection, and the integration of human feedback to mitigate non-deter...

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Evaluation best practices

When To Use

Use when you need to design a structured testing process to measure the accuracy, reliability, and performance of a generative AI system using custom metrics and datasets.

Reference Files

File	Contains	Use For
`SKILL.md`	Entry point: scope, routing table, and workflow.	Start here.
`docs/evaluation-best-practices-workflow-guide.md`	A guide outlining methodologies for designing and implementing evaluation processes for generative AI systems.	Questions about a guide outlining methodologies for designing and implementing evaluation processes for generative AI systems.

What This Skill Covers

Generative AI is variable. Models sometimes produce different output from the same input, which makes traditional software testing methods insufficient for A...
Main sections: What are evals?, Types of evals, How to read evals, Design your eval process, Example: Summarizing transcripts.

Workflow

Open the most relevant file under docs/ for the exact documented workflow and wording.
Open schemas/ files for exact structured contracts.
Open examples/ files for concrete requests, commands, snippets, and manifests.
Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://developers.openai.com/api/docs/guides/evaluation-best-practices.md