Log pre-computed experiment results to Arize. Use this when you’ve already executed your experiment elsewhere and want to record the results. Unlike run(), this does not execute the task - it only logs existing results.
from arize.experiments import ( ExperimentTaskFieldNames, EvaluationResultFieldNames,)experiment_runs = [ { "example_id": "ex-1", "output": "Paris is the capital of France", "latency_ms": 245, "correctness_score": 1.0, "correctness_label": "correct", }, { "example_id": "ex-2", "output": "William Shakespeare wrote Romeo and Juliet", "latency_ms": 198, "correctness_score": 1.0, "correctness_label": "correct", },]task_fields = ExperimentTaskFieldNames( example_id="example_id", output="output",)evaluator_columns = { "Correctness": EvaluationResultFieldNames( score="correctness_score", label="correctness_label", )}experiment = client.experiments.create( name="pre-computed-experiment", dataset="dataset-name-or-id", experiment_runs=experiment_runs, task_fields=task_fields, evaluator_columns=evaluator_columns,)
Delete an experiment by name or ID. This operation is irreversible. There is no response from this call.
client.experiments.delete( experiment="experiment-name-or-id", dataset="dataset-name-or-id", # required when using a name)print("Experiment deleted successfully")
Execute your experiment locally without logging results to Arize. Use this to test your task and evaluators before committing to a full run.
experiment, experiment_df = client.experiments.run( ..., dry_run=True, # Test locally without logging dry_run_count=10, # Only run on first 10 examples)# Note: experiment is None in dry-run modeprint(f"Results DataFrame shape: {experiment_df.shape}")
Retrieve individual runs from an experiment with pagination support. Pass all=True to fetch all runs via Flight (ignores limit).
resp = client.experiments.list_runs( experiment="experiment-name-or-id", dataset="dataset-name-or-id", # required when using a name limit=100,)for run in resp.experiment_runs: print(run)
For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see Response Objects.
The annotate_runs method is currently in ALPHA. The API may change without notice.
Write human annotations to a batch of runs in an experiment. Annotations are upserted by annotation config name for each run; submitting the same name for the same run overwrites the previous value. Up to 1000 runs may be annotated per request.
from arize.experiments.types import AnnotateRecordInput, AnnotationInputresult = client.experiments.annotate_runs( experiment="experiment-name-or-id", dataset="dataset-name-or-id", # optional, used to resolve experiment by name space="your-space-name-or-id", # optional, used to resolve dataset by name annotations=[ AnnotateRecordInput( record_id="your-run-id", values=[ AnnotationInput(name="accuracy", label="correct", score=1.0), AnnotationInput(name="notes", text="Well-structured output"), ], ), ],)print(result)