-
Notifications
You must be signed in to change notification settings - Fork 231
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1823 from Agenta-AI/mmabrouk/docs/AGE-367-new-coo…
…kbook-sdk docs(tool): AGE-367 add doc for using sdk
- Loading branch information
Showing
2 changed files
with
114 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
--- | ||
title: "Running Evaluations with SDK" | ||
--- | ||
|
||
<Note> | ||
This guide is also available as a [Jupyter | ||
Notebook](https://github.com/Agenta-AI/agenta/blob/main/cookbook/evaluations_with_sdk.ipynb). | ||
</Note> | ||
|
||
## Introduction | ||
|
||
In this guide, we'll demonstrate how to interact programmatically with evaluations in the Agenta platform using the SDK (or the raw API). This will include: | ||
|
||
- Creating a test set | ||
- Configuring an evaluator | ||
- Running an evaluation | ||
- Retrieving the results of evaluations | ||
|
||
This assumes that you have already created an LLM application and variants in Agenta. | ||
|
||
## Architectural Overview | ||
|
||
Evaluations are executed on the Agenta backend. Specifically, Agenta invokes the LLM application for each row in the test set and processes the output using the designated evaluator. Operations are managed through Celery tasks. The interactions with the LLM application are asynchronous, batched, and include retry mechanisms. The batching configuration can be adjusted to avoid exceeding rate limits imposed by the LLM provider. | ||
|
||
## Setup | ||
|
||
### Installation | ||
|
||
Ensure that the Agenta SDK is installed and up-to-date in your development environment: | ||
|
||
```bash | ||
pip install -U agenta | ||
``` | ||
|
||
### Configuration | ||
|
||
After setting up your environment, you need to configure the SDK: | ||
|
||
```python | ||
from agenta.client.backend.client import AgentaApi | ||
|
||
# Set up your application ID and API key | ||
app_id = "your_app_id" | ||
api_key = "your_api_key" | ||
host = "https://cloud.agenta.ai" | ||
|
||
# Initialize the client | ||
client = AgentaApi(base_url=host + "/api", api_key=api_key) | ||
``` | ||
|
||
## Create a Test Set | ||
|
||
You can create and update test sets as follows: | ||
|
||
```python | ||
from agenta.client.backend.types.new_testset import NewTestset | ||
|
||
# Example data for the test set | ||
csvdata = [ | ||
{"country": "France", "capital": "Paris"}, | ||
{"country": "Germany", "capital": "Berlin"} | ||
] | ||
|
||
# Create a new test set | ||
response = client.testsets.create_testset(app_id=app_id, request=NewTestset(name="Test Set", csvdata=csvdata)) | ||
test_set_id = response.id | ||
``` | ||
|
||
## Create Evaluators | ||
|
||
Set up evaluators that will assess the performance based on specific criteria: | ||
|
||
```python | ||
# Create an exact match evaluator | ||
response = client.evaluators.create_new_evaluator_config( | ||
app_id=app_id, name="Capital Evaluator", evaluator_key="auto_exact_match", | ||
settings_values={"correct_answer_key": "capital"} | ||
) | ||
exact_match_eval_id = response.id | ||
``` | ||
|
||
## Run an Evaluation | ||
|
||
Execute an evaluation using the previously defined test set and evaluators: | ||
|
||
```python | ||
from agenta.client.backend.types.llm_run_rate_limit import LlmRunRateLimit | ||
|
||
response = client.evaluations.create_evaluation( | ||
app_id=app_id, variant_ids=["your_variant_id"], testset_id=test_set_id, | ||
evaluators_configs=[exact_match_eval_id], | ||
rate_limit=LlmRunRateLimit(batch_size=10, max_retries=3, retry_delay=2, delay_between_batches=5) | ||
) | ||
``` | ||
|
||
## Retrieve Results | ||
|
||
After running the evaluation, fetch the results to see how well the model performed against the test set: | ||
|
||
```python | ||
results = client.evaluations.fetch_evaluation_results("your_evaluation_id") | ||
print(results) | ||
``` | ||
|
||
## Conclusion | ||
|
||
This guide covers the basic steps for using the SDK to manage evaluations within Agenta. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters