Prompt Evaluations: Anthropic's Official Course · Lesson 2

Workbench Evals: Rapid Prototyping

Anthropic Workbench for manual prompt testing. Running evals across multiple test cases. Comparing prompt versions (v1 vs v2). Human scoring on a 1-5 scale.

30 min read3 questions in quizReady prompt includedIn progress

Practical exercise

What to do after this lesson

Open Anthropic Workbench. Create a prompt with two variables. Add 3 test cases, score results (1-5), improve the prompt, and compare versions via Add Comparison.

Task grader

Open Anthropic Workbench. Create a prompt with two variables. Add 3 test cases, score results (1-5), improve the prompt, and compare versions via Add Comparison.

Your answer

Ready-to-use prompt

Template for this lesson

Copy and adapt to your context. Text in angle brackets should be replaced.

You are a skilled programmer translating code to Python.

<source_code>
{{SOURCE_CODE}}
</source_code>

Source language: {{SOURCE_LANGUAGE}}

Translate to Python. Format:

<python_code>
[translation here]
</python_code>

Only output the <python_code> tags, no preamble or explanation.

Prompt sandbox

Prompt

Common mistakes

What people get wrong

Evaluating only one test case — not representative.
Not versioning prompts — forgetting exactly what changed.

Report a bug

Workbench Evals: Rapid Prototyping

Task grader

Prompt sandbox

Quiz — 3 questions

Discussion

Anthropic Workbench

Prompt with Variables

Iteration Process

Problems with the First Prompt Version