Skip to content

Commit

Permalink
Merge pull request #1823 from Agenta-AI/mmabrouk/docs/AGE-367-new-coo…
Browse files Browse the repository at this point in the history
…kbook-sdk

docs(tool): AGE-367 add doc for using sdk
  • Loading branch information
mmabrouk authored Jun 27, 2024
2 parents dfe9ef9 + fe9e1f2 commit beaae10
Show file tree
Hide file tree
Showing 2 changed files with 114 additions and 0 deletions.
107 changes: 107 additions & 0 deletions docs/guides/evaluation_from_sdk.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
---
title: "Running Evaluations with SDK"
---

<Note>
This guide is also available as a [Jupyter
Notebook](https://github.com/Agenta-AI/agenta/blob/main/cookbook/evaluations_with_sdk.ipynb).
</Note>

## Introduction

In this guide, we'll demonstrate how to interact programmatically with evaluations in the Agenta platform using the SDK (or the raw API). This will include:

- Creating a test set
- Configuring an evaluator
- Running an evaluation
- Retrieving the results of evaluations

This assumes that you have already created an LLM application and variants in Agenta.

## Architectural Overview

Evaluations are executed on the Agenta backend. Specifically, Agenta invokes the LLM application for each row in the test set and processes the output using the designated evaluator. Operations are managed through Celery tasks. The interactions with the LLM application are asynchronous, batched, and include retry mechanisms. The batching configuration can be adjusted to avoid exceeding rate limits imposed by the LLM provider.

## Setup

### Installation

Ensure that the Agenta SDK is installed and up-to-date in your development environment:

```bash
pip install -U agenta
```

### Configuration

After setting up your environment, you need to configure the SDK:

```python
from agenta.client.backend.client import AgentaApi

# Set up your application ID and API key
app_id = "your_app_id"
api_key = "your_api_key"
host = "https://cloud.agenta.ai"

# Initialize the client
client = AgentaApi(base_url=host + "/api", api_key=api_key)
```

## Create a Test Set

You can create and update test sets as follows:

```python
from agenta.client.backend.types.new_testset import NewTestset

# Example data for the test set
csvdata = [
{"country": "France", "capital": "Paris"},
{"country": "Germany", "capital": "Berlin"}
]

# Create a new test set
response = client.testsets.create_testset(app_id=app_id, request=NewTestset(name="Test Set", csvdata=csvdata))
test_set_id = response.id
```

## Create Evaluators

Set up evaluators that will assess the performance based on specific criteria:

```python
# Create an exact match evaluator
response = client.evaluators.create_new_evaluator_config(
app_id=app_id, name="Capital Evaluator", evaluator_key="auto_exact_match",
settings_values={"correct_answer_key": "capital"}
)
exact_match_eval_id = response.id
```

## Run an Evaluation

Execute an evaluation using the previously defined test set and evaluators:

```python
from agenta.client.backend.types.llm_run_rate_limit import LlmRunRateLimit

response = client.evaluations.create_evaluation(
app_id=app_id, variant_ids=["your_variant_id"], testset_id=test_set_id,
evaluators_configs=[exact_match_eval_id],
rate_limit=LlmRunRateLimit(batch_size=10, max_retries=3, retry_delay=2, delay_between_batches=5)
)
```

## Retrieve Results

After running the evaluation, fetch the results to see how well the model performed against the test set:

```python
results = client.evaluations.fetch_evaluation_results("your_evaluation_id")
print(results)
```

## Conclusion

This guide covers the basic steps for using the SDK to manage evaluations within Agenta.
7 changes: 7 additions & 0 deletions docs/mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -343,7 +343,14 @@
"guides/tutorials/deploy-mistral-model",
"guides/extract_job_information"
]
},
{
"group": "Cookbooks",
"pages": [
"guides/evaluation_from_sdk"
]
}

],
"api": {
"baseUrl": "http://localhost/api"
Expand Down

0 comments on commit beaae10

Please sign in to comment.