Prompt Evaluations: Anthropic's Official Course · Lesson 3
Code-Graded Eval: From Zero to Baseline
Building a code-graded eval from scratch. Animal leg-counting test. Prompt v1 vs v2 vs v3 (chain-of-thought). Extracting answers from <answer> tags.
Building a code-graded eval from scratch. Animal leg-counting test. Prompt v1 vs v2 vs v3 (chain-of-thought). Extracting answers from <answer> tags.