Merge pull request #1823 from Agenta-AI/mmabrouk/docs/AGE-367-new-coo…

…kbook-sdk docs(tool): AGE-367 add doc for using sdk
Agenta-AI · Jun 27, 2024 · beaae10 · beaae10
2 parents dfe9ef9 + fe9e1f2
commit beaae10
Show file tree

Hide file tree

Showing 2 changed files with 114 additions and 0 deletions.
diff --git a/docs/guides/evaluation_from_sdk.mdx b/docs/guides/evaluation_from_sdk.mdx
@@ -0,0 +1,107 @@
+---
+title: "Running Evaluations with SDK"
+---
+
+<Note>
+  This guide is also available as a [Jupyter
+  Notebook](https://github.com/Agenta-AI/agenta/blob/main/cookbook/evaluations_with_sdk.ipynb).
+</Note>
+
+## Introduction
+
+In this guide, we'll demonstrate how to interact programmatically with evaluations in the Agenta platform using the SDK (or the raw API). This will include:
+
+- Creating a test set
+- Configuring an evaluator
+- Running an evaluation
+- Retrieving the results of evaluations
+
+This assumes that you have already created an LLM application and variants in Agenta.
+
+## Architectural Overview
+
+Evaluations are executed on the Agenta backend. Specifically, Agenta invokes the LLM application for each row in the test set and processes the output using the designated evaluator. Operations are managed through Celery tasks. The interactions with the LLM application are asynchronous, batched, and include retry mechanisms. The batching configuration can be adjusted to avoid exceeding rate limits imposed by the LLM provider.
+
+## Setup
+
+### Installation
+
+Ensure that the Agenta SDK is installed and up-to-date in your development environment:
+
+```bash
+pip install -U agenta
+```
+
+### Configuration
+
+After setting up your environment, you need to configure the SDK:
+
+```python
+from agenta.client.backend.client import AgentaApi
+
+# Set up your application ID and API key
+app_id = "your_app_id"
+api_key = "your_api_key"
+host = "https://cloud.agenta.ai"
+
+# Initialize the client
+client = AgentaApi(base_url=host + "/api", api_key=api_key)
+```
+
+## Create a Test Set
+
+You can create and update test sets as follows:
+
+```python
+from agenta.client.backend.types.new_testset import NewTestset
+
+# Example data for the test set
+csvdata = [
+    {"country": "France", "capital": "Paris"},
+    {"country": "Germany", "capital": "Berlin"}
+]
+
+# Create a new test set
+response = client.testsets.create_testset(app_id=app_id, request=NewTestset(name="Test Set", csvdata=csvdata))
+test_set_id = response.id
+```
+
+## Create Evaluators
+
+Set up evaluators that will assess the performance based on specific criteria:
+
+```python
+# Create an exact match evaluator
+response = client.evaluators.create_new_evaluator_config(
+    app_id=app_id, name="Capital Evaluator", evaluator_key="auto_exact_match",
+    settings_values={"correct_answer_key": "capital"}
+)
+exact_match_eval_id = response.id
+```
+
+## Run an Evaluation
+
+Execute an evaluation using the previously defined test set and evaluators:
+
+```python
+from agenta.client.backend.types.llm_run_rate_limit import LlmRunRateLimit
+
+response = client.evaluations.create_evaluation(
+    app_id=app_id, variant_ids=["your_variant_id"], testset_id=test_set_id,
+    evaluators_configs=[exact_match_eval_id],
+    rate_limit=LlmRunRateLimit(batch_size=10, max_retries=3, retry_delay=2, delay_between_batches=5)
+)
+```
+
+## Retrieve Results
+
+After running the evaluation, fetch the results to see how well the model performed against the test set:
+
+```python
+results = client.evaluations.fetch_evaluation_results("your_evaluation_id")
+print(results)
+```
+
+## Conclusion
+
+This guide covers the basic steps for using the SDK to manage evaluations within Agenta.
diff --git a/docs/mint.json b/docs/mint.json
@@ -343,7 +343,14 @@
         "guides/tutorials/deploy-mistral-model",
         "guides/extract_job_information"
       ]
+    },
+        {
+      "group": "Cookbooks",
+      "pages": [
+        "guides/evaluation_from_sdk"
+      ]
     }
+
   ],
   "api": {
     "baseUrl": "http://localhost/api"