Prompt Buddy logoPrompt Buddy

openai · OpenAI Platform Docs

Evaluation best practices

A guide on designing and implementing custom evaluation frameworks for LLM applications, covering objective definition, dataset collection, metric selection, and the integration of human feedback to mitigate non-deter...

Import to Prompt Buddy

Derived skill

Files assembled from official documentation

Viewing SKILL.md

Evaluation best practices

A guide on designing and implementing custom evaluation frameworks for LLM applications, covering objective definition, dataset collection, metric selection, and the integration of human feedback to mitigate non-deter...

When To Use

Use when you need to design a structured testing process to measure the accuracy, reliability, and performance of a generative AI system using custom metrics and datasets.

Reference Files

FileContainsUse For
SKILL.mdEntry point: scope, routing table, and workflow.Start here.
docs/evaluation-best-practices-workflow-guide.mdA guide outlining methodologies for designing and implementing evaluation processes for generative AI systems.Questions about a guide outlining methodologies for designing and implementing evaluation processes for generative AI systems.

What This Skill Covers

  • Generative AI is variable. Models sometimes produce different output from the same input, which makes traditional software testing methods insufficient for A...
  • Main sections: What are evals?, Types of evals, How to read evals, Design your eval process, Example: Summarizing transcripts.

Workflow

  1. Open the most relevant file under docs/ for the exact documented workflow and wording.
  2. Open schemas/ files for exact structured contracts.
  3. Open examples/ files for concrete requests, commands, snippets, and manifests.
  4. Do not add behavior or configuration that is not present in the attached source files.

Canonical source: https://developers.openai.com/api/docs/guides/evaluation-best-practices.md