Agent Evals

After testing in the playground, evaluate your Agents across multiple test cases to ensure consistent performance using the test runs.

Create a Dataset

Add test cases by creating a Dataset. For this example, we’ll use a Dataset of product images to generate descriptions.

Build your Agent

Create an Agent that processes your test examples. In this case, the agent generates product descriptions, translates them to multiple languages, and formats them to match specific requirements.

Agent for product description generation

Start a test run

Open the test configuration by clicking Test in the top right corner.

Select your dataset

Select your dataset from the dropdown.

Test configuration with dataset and evaluator options

Configure your test

Select Evaluators to measure the quality of outputs and map the evaluator variables to the dataset columns.You can read more about mapping evaluator variables here.

You can use create and use Presets for your test runs to save time and avoid repeating the same configuration.

Review results

Monitor the test run to analyze the performance of your Prompt Chain across all inputs.

Test run results showing performance metrics

Agent Deployment

Query Agents via SDK

Introduction

Prompt Engineering

Offline Evals

Online Evals

Tracing

Simulations

Library

Dashboards

Integrations

Settings

CI/CD