We use cookies to deliver and improve our services, analyze site usage, and if you agree, to customize or personalize your experience and market our services to you. You can read our Cookie Policy here.
To get started with the Evaluation tool:

Ensure your prompt includes at least 1-2 dynamic variables using the double brace syntax: {{variable}}. This is required for creating eval test sets.
The Console offers a built-in prompt generator powered by Claude Opus 4.1:
Click 'Generate Prompt'
Clicking the 'Generate Prompt' helper tool will open a modal that allows you to enter your task information.
Describe your task
Describe your desired task (e.g., "Triage inbound customer support requests") with as much or as little detail as you desire. The more context you include, the more Claude can tailor its generated prompt to your specific needs.
Generate your prompt
Clicking the orange 'Generate Prompt' button at the bottom will have Claude generate a high quality prompt for you. You can then further improve those prompts using the Evaluation screen in the Console.
This feature makes it easier to create prompts with the appropriate variable syntax for evaluation.

When you access the Evaluation screen, you have several options to create test cases:
To use the 'Generate Test Case' feature:
Click on 'Generate Test Case'
Claude will generate test cases for you, one row at a time for each time you click the button.
Edit generation logic (optional)
You can also edit the test case generation logic by clicking on the arrow dropdown to the right of the 'Generate Test Case' button, then on 'Show generation logic' at the top of the Variables window that pops up. You may have to click `Generate' on the top right of this window to populate initial generation logic.
Editing this allows you to customize and fine tune the test cases that Claude generates to greater precision and specificity.
Here's an example of a populated Evaluation screen with several test cases:

If you update your original prompt text, you can re-run the entire eval suite against the new prompt to see how changes affect performance across all test cases.
Use the 'Generate a prompt' helper tool in the Console to quickly create prompts with the appropriate variable syntax for evaluation.
The Evaluation tool offers several features to help you refine your prompts:
By reviewing results across test cases and comparing different prompt versions, you can spot patterns and make informed adjustments to your prompt more efficiently.
Start evaluating your prompts today to build more robust AI applications with Claude!