Skip to main content

Documentation Index

Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt

Use this file to discover all available pages before exploring further.

The evaluators client methods are currently in ALPHA. The API may change without notice. A one-time warning is emitted on first use.
Create and manage LLM-as-judge evaluators and their versions programmatically. Evaluators use prompt templates with {{variable}} placeholders that reference span or trace attributes to automatically score your LLM application’s outputs.

Key Capabilities

  • Create template-based LLM-as-judge evaluators within a space
  • Version evaluators with commit messages (versions are immutable once created)
  • Retrieve evaluators with their latest or a specific version
  • List, update, and delete evaluators
  • List and retrieve individual evaluator versions

List Evaluators

List all evaluators you have access to, with optional filtering by space.
resp = client.evaluators.list(
    space="your-space-name-or-id",  # optional
    name="Relevance",               # optional substring filter
    limit=50,
)

for evaluator in resp.evaluators:
    print(evaluator.id, evaluator.name)
For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see Response Objects.

Create a Template (LLM-as-Judge) Evaluator

Create a new template evaluator with an initial version. Evaluator names must be unique within the target space.
from arize.evaluators.types import TemplateConfig, EvaluatorLlmConfig

evaluator = client.evaluators.create_template_evaluator(
    name="Relevance",
    space="your-space-name-or-id",
    commit_message="Initial version",
    description="Scores whether the response is relevant to the query",
    template_config=TemplateConfig(
        name="Relevance",
        template="Is the following response relevant to the query?\nQuery: {{input.value}}\nResponse: {{output.value}}",
        include_explanations=True,
        use_function_calling_if_available=True,
        classification_choices={"relevant": 1, "irrelevant": 0},
        direction="maximize",
        llm_config=EvaluatorLlmConfig(
            ai_integration_id="your-ai-integration-id",
            model_name="gpt-4o",
            invocation_parameters={"temperature": 0},
        ),
    ),
)

print(evaluator.id, evaluator.name)

Create a Code Evaluator

Create a new code evaluator with an initial version. Use ManagedCodeConfig for built-in checks (JSONParseable, Regex, KeywordMatch, ExactMatch) or CustomCodeConfig for user-supplied Python.
from arize.evaluators.types import CodeConfig, ManagedCodeConfig

evaluator = client.evaluators.create_code_evaluator(
    name="JSON Parseable",
    space="your-space-name-or-id",
    commit_message="Initial version",
    code_config=CodeConfig(
        ManagedCodeConfig(
            name="json_parseable",
            managed_evaluator="JSONParseable",
            variables=["output"],
        ),
    ),
)

print(evaluator.id, evaluator.name)
Evaluator name must match the regex ^[a-zA-Z0-9_\s\-&()]+$.

Template Variables

Template strings use {{variable}} placeholders that reference span or trace attributes (e.g., {{input.value}}, {{output.value}}, {{attributes.my_custom_attr}}).

Classification vs. Freeform Output

  • Classification — Provide classification_choices as a dict[str, float] mapping label → numeric score (e.g., {"relevant": 1, "irrelevant": 0}). The evaluator outputs one of these labels along with its score.
  • Freeform — Omit classification_choices. The evaluator produces a numeric score without predefined labels.

Get an Evaluator

Retrieve an evaluator by name or ID. By default the latest version is returned. When using a name, provide space to disambiguate.
evaluator = client.evaluators.get(evaluator="your-evaluator-name-or-id")

print(evaluator.id, evaluator.name)
print(evaluator.version)

Get a Specific Version

evaluator = client.evaluators.get(
    evaluator="your-evaluator-name-or-id",
    version_id="specific-version-id",
)

Update an Evaluator

Update an evaluator’s metadata (name and/or description). To change the template configuration, create a new version instead.
evaluator = client.evaluators.update(
    evaluator="your-evaluator-name-or-id",
    name="Relevance v2",
    description="Updated description",
)

print(evaluator)

Delete an Evaluator

Delete an evaluator and all its versions. This operation is irreversible. There is no response from this call.
client.evaluators.delete(evaluator="your-evaluator-name-or-id")

print("Evaluator deleted successfully")

Manage Versions

Evaluator versions are immutable once created. To change the template configuration, create a new version — it becomes the latest version immediately.

List Versions

List all versions for an evaluator.
resp = client.evaluators.list_versions(
    evaluator="your-evaluator-name-or-id",
    limit=50,
)

for version in resp.evaluator_versions:
    print(version.id, version.commit_message)
For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see Response Objects.

Get a Version

Retrieve a specific evaluator version by its ID.
version = client.evaluators.get_version(version_id="your-version-id")

print(version.id, version.commit_message)

Create a New Template Version

Add a new template version to an existing template evaluator. The new version becomes the latest immediately.
from arize.evaluators.types import TemplateConfig, EvaluatorLlmConfig

version = client.evaluators.create_template_version(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
    commit_message="Improved prompt for edge cases",
    template_config=TemplateConfig(
        name="Relevance",
        template="Rate the relevance of the response on a scale of 0 to 1.\nQuery: {{input.value}}\nResponse: {{output.value}}",
        include_explanations=True,
        use_function_calling_if_available=True,
        classification_choices={"relevant": 1, "irrelevant": 0},
        direction="maximize",
        llm_config=EvaluatorLlmConfig(
            ai_integration_id="your-ai-integration-id",
            model_name="gpt-4o",
            invocation_parameters={"temperature": 0},
        ),
    ),
)

print(version.id)

Create a New Code Version

Add a new code version to an existing code evaluator.
from arize.evaluators.types import CodeConfig, ManagedCodeConfig

version = client.evaluators.create_code_version(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
    commit_message="Updated managed evaluator",
    code_config=CodeConfig(
        ManagedCodeConfig(
            name="json_parseable",
            managed_evaluator="JSONParseable",
            variables=["output"],
        ),
    ),
)

print(version.id)
Learn more: Online Evaluations Documentation