AdvancedИнженерияclaude
Prompt Evaluations: Anthropic's Official Course
How to measure and improve prompt quality: code-graded evals, model-graded assessments, workbench tests, PromptFoo. Anthropic's methodology for production systems.
4modules
8lessons
240 mintotal time
AI engineers building reliable LLM pipelinesaudience
Module 1
Evaluation Fundamentals
What evals are, why they matter, and how they are structured.
Module 2
Code-Graded Evals
Automated programmatic evaluation — fast, scalable, objective.
Module 3
PromptFoo: Scalable Evals
The PromptFoo framework for automated evals with a dashboard, CSV tests, and custom graders.
Module 4
Model-Graded Evals
LLM-as-judge: when code-graded falls short and subjective criteria are needed.