feat: automation to generate markdown files from cookbook

Agenta-AI · Aug 21, 2024 · 31a9610 · 31a9610
1 parent 4046553
commit 31a9610
Show file tree

Hide file tree

Showing 8 changed files with 349 additions and 378 deletions.
diff --git a/cookbook/README.md b/cookbook/README.md
@@ -0,0 +1,42 @@
+# Convert Cookbook (`.ipynb`) to Markdown Files
+
+## Prerequisites
+
+1. **Python Environment**: Ensure you have Python 3.x installed.
+
+2. **Install Required Packages**: 
+   ```bash
+   pip install nbconvert
+   ```
+
+## Conversion Instructions
+
+You can convert `.ipynb` files to markdown format using the provided script. Below are two methods: converting all cookbook at once or converting a single notebook.
+
+
+### 1. First get inside the `website` directory:
+```bash
+cd ./website
+ ```
+
+### 2. Convert All cookbooks
+
+To convert all files inside the `cookbook` directory, run following command:
+```bash
+make generate_cookbook_docs
+```
+
+### 3. Convert a single cookbook file
+
+To convert a single cookbook file, you can specify the filename during the `make` command:
+```bash
+make generate_cookbook_docs file=example.ipynb
+```
+
+> All the converted `.ipynb` files will store in the `docs/guides/cookbooks/` directory
+
+
+## Notes
+
+ The output markdown files will include an optional note with links to the original Jupyter notebook on GitHub.
+
diff --git a/cookbook/evaluations_with_sdk.ipynb b/cookbook/evaluations_with_sdk.ipynb
diff --git a/website/Makefile b/website/Makefile
@@ -0,0 +1,3 @@
+generate_cookbook_docs:
+	mkdir -p ./docs/guides/cookbooks
+	python3 scripts/generate_cookbooks.py $(file)
diff --git a/website/README.md b/website/README.md
@@ -1,41 +1,18 @@
-# Website
+# Agenta Documentation
 
-This website is built using [Docusaurus](https://docusaurus.io/), a modern static website generator.
+This doc is built using [Docusaurus](https://docusaurus.io/), a modern static website generator.
 
-### Installation
+## Installation
 
+1. Install dependencies
+```bash
+npm install
 ```
-$ yarn
+2. Start local server
 ```
-
-### Local Development
-
-```
-$ yarn start
-```
-
-This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.
-
-### Build
-
-```
-$ yarn build
-```
-
-This command generates static content into the `build` directory and can be served using any static contents hosting service.
-
-### Deployment
-
-Using SSH:
-
+npm run start
 ```
-$ USE_SSH=true yarn deploy
+3. Run build to ensure everything is working
 ```
-
-Not using SSH:
-
-```
-$ GIT_USER=<Your GitHub username> yarn deploy
-```
-
-If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch.
+npm run build
+```
diff --git a/website/docs/guides/cookbooks/evaluations_with_sdk.mdx b/website/docs/guides/cookbooks/evaluations_with_sdk.mdx
@@ -0,0 +1,166 @@
+---
+title: "Evaluations with sdk"
+---
+
+
+
+:::note
+  This guide is also available as a [Jupyter Notebook](https://github.com/Agenta-AI/agenta/blob/main/cookbook/evaluations_with_sdk.ipynb).
+:::
+
+# Using evaluations with the SDK
+In this cookbook we will show how to interact with evaluation in agenta programatically. Either using the SDK (or the raw API). 
+
+We will do the following:
+
+- Create a test set
+- Create and configure an evaluator
+- Run an evaluation
+- Retrieve the results of evaluations
+
+We assume that you have already created an LLM application and variants in agenta. 
+
+
+### Architectural Overview:
+In this scenario, evaluations are executed on the Agenta backend. Specifically, Agenta invokes the LLM application for each row in the test set and subsequently processes the output using the designated evaluator. 
+This operation is managed through Celery tasks. The interactions with the LLM application are asynchronous, batched, and include retry mechanisms. Additionally, the batching configuration can be adjusted to avoid exceeding the rate limits imposed by the LLM provider.
+
+
+## Setup 
+
+
+```python
+! pip install -U agenta
+```
+
+## Configuration Setup
+
+
+
+```python
+# Assuming an application has already been created through the user interface, you will need to obtain the application ID.
+# In this example we will use the default template single_prompt which has the prompt "Determine the capital of {country}"
+
+# You can find the application ID in the URL. For example, in the URL https://cloud.agenta.ai/apps/666dde95962bbaffdb0072b5/playground?variant=app.default, the application ID is `666dde95962bbaffdb0072b5`.
+from agenta.client.backend.client import AgentaApi
+# Let's list the applications
+client.apps.list_apps()
+```
+
+
+```python
+
+app_id = "667d8cfad1812781f7e375d9"
+
+# You can create the API key under the settings page. If you are using the OSS version, you should keep this as an empty string
+api_key = "EUqJGOUu.xxxx"
+
+# Host. 
+host = "https://cloud.agenta.ai"
+
+# Initialize the client
+
+client = AgentaApi(base_url=host + "/api", api_key=api_key)
+```
+
+## Create a test set
+
+
+```python
+from agenta.client.backend.types.new_testset import NewTestset
+
+csvdata = [
+        {"country": "france", "capital": "Paris"},
+        {"country": "Germany", "capital": "paris"}
+    ]
+
+response = client.testsets.create_testset(app_id=app_id, request=NewTestset(name="test set", csvdata=csvdata))
+test_set_id = response.id
+
+# let's now update it
+
+csvdata = [
+        {"country": "france", "capital": "Paris"},
+        {"country": "Germany", "capital": "Berlin"}
+    ]
+
+client.testsets.update_testset(testset_id=test_set_id, request=NewTestset(name="test set", csvdata=csvdata))
+```
+
+# Create evaluators
+
+
+```python
+# Create an evaluator that performs an exact match comparison on the 'capital' column
+# You can find the list of evaluator keys and evaluators and their configurations in https://github.com/Agenta-AI/agenta/blob/main/agenta-backend/agenta_backend/resources/evaluators/evaluators.py
+response = client.evaluators.create_new_evaluator_config(app_id=app_id, name="capital_evaluator", evaluator_key="auto_exact_match", settings_values={"correct_answer_key": "capital"})
+exact_match_eval_id = response.id
+
+code_snippet = """
+from typing import Dict
+
+def evaluate(
+    app_params: Dict[str, str],
+    inputs: Dict[str, str],
+    output: str,  # output of the llm app
+    datapoint: Dict[str, str]  # contains the testset row 
+) -> float:
+    if output and output[0].isupper():
+        return 1.0
+    else:
+        return 0.0
+"""
+
+response = client.evaluators.create_new_evaluator_config(app_id=app_id, name="capital_letter_evaluator", evaluator_key="auto_custom_code_run", settings_values={"code": code_snippet})
+letter_match_eval_id = response.id
+```
+
+
+```python
+# get list of all evaluators
+client.evaluators.get_evaluator_configs(app_id=app_id)
+```
+
+# Run an evaluation
+
+
+```python
+response = client.apps.list_app_variants(app_id=app_id)
+print(response)
+myvariant_id = response[0].variant_id
+```
+
+
+```python
+# Run an evaluation
+from agenta.client.backend.types.llm_run_rate_limit import LlmRunRateLimit
+response = client.evaluations.create_evaluation(app_id=app_id, variant_ids=[myvariant_id], testset_id=test_set_id, evaluators_configs=[exact_match_eval_id, letter_match_eval_id],
+                                                rate_limit=LlmRunRateLimit(
+        batch_size=10, # number of rows to call in parallel
+        max_retries=3, # max number of time to retry a failed llm call
+        retry_delay=2, # delay before retrying a failed llm call
+        delay_between_batches=5, # delay between batches
+    ),)
+print(response)
+```
+
+
+```python
+# check the status
+client.evaluations.fetch_evaluation_status('667d98fbd1812781f7e3761a')
+```
+
+
+```python
+# fetch the overall results
+response = client.evaluations.fetch_evaluation_results('667d98fbd1812781f7e3761a')
+
+results = [(evaluator["evaluator_config"]["name"], evaluator["result"]) for evaluator in response["results"]]
+# End of  Selection
+```
+
+
+```python
+# fetch the detailed results
+client.evaluations.fetch_evaluation_scenarios(evaluations_ids='667d98fbd1812781f7e3761a')
+```
diff --git a/website/docs/guides/evaluation_from_sdk.mdx b/website/docs/guides/evaluation_from_sdk.mdx