Skip to content

Commit

Permalink
feat: automation to generate markdown files from cookbook
Browse files Browse the repository at this point in the history
  • Loading branch information
ashrafchowdury committed Aug 21, 2024
1 parent 4046553 commit 31a9610
Show file tree
Hide file tree
Showing 8 changed files with 349 additions and 378 deletions.
42 changes: 42 additions & 0 deletions cookbook/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Convert Cookbook (`.ipynb`) to Markdown Files

## Prerequisites

1. **Python Environment**: Ensure you have Python 3.x installed.

2. **Install Required Packages**:
```bash
pip install nbconvert
```

## Conversion Instructions

You can convert `.ipynb` files to markdown format using the provided script. Below are two methods: converting all cookbook at once or converting a single notebook.


### 1. First get inside the `website` directory:
```bash
cd ./website
```

### 2. Convert All cookbooks

To convert all files inside the `cookbook` directory, run following command:
```bash
make generate_cookbook_docs
```

### 3. Convert a single cookbook file

To convert a single cookbook file, you can specify the filename during the `make` command:
```bash
make generate_cookbook_docs file=example.ipynb
```

> All the converted `.ipynb` files will store in the `docs/guides/cookbooks/` directory

## Notes

The output markdown files will include an optional note with links to the original Jupyter notebook on GitHub.

256 changes: 20 additions & 236 deletions cookbook/evaluations_with_sdk.ipynb

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions website/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
generate_cookbook_docs:
mkdir -p ./docs/guides/cookbooks
python3 scripts/generate_cookbooks.py $(file)
45 changes: 11 additions & 34 deletions website/README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,18 @@
# Website
# Agenta Documentation

This website is built using [Docusaurus](https://docusaurus.io/), a modern static website generator.
This doc is built using [Docusaurus](https://docusaurus.io/), a modern static website generator.

### Installation
## Installation

1. Install dependencies
```bash
npm install
```
$ yarn
2. Start local server
```

### Local Development

```
$ yarn start
```

This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.

### Build

```
$ yarn build
```

This command generates static content into the `build` directory and can be served using any static contents hosting service.

### Deployment

Using SSH:

npm run start
```
$ USE_SSH=true yarn deploy
3. Run build to ensure everything is working
```

Not using SSH:

```
$ GIT_USER=<Your GitHub username> yarn deploy
```

If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch.
npm run build
```
166 changes: 166 additions & 0 deletions website/docs/guides/cookbooks/evaluations_with_sdk.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
---
title: "Evaluations with sdk"
---



:::note
This guide is also available as a [Jupyter Notebook](https://github.com/Agenta-AI/agenta/blob/main/cookbook/evaluations_with_sdk.ipynb).
:::

# Using evaluations with the SDK
In this cookbook we will show how to interact with evaluation in agenta programatically. Either using the SDK (or the raw API).

We will do the following:

- Create a test set
- Create and configure an evaluator
- Run an evaluation
- Retrieve the results of evaluations

We assume that you have already created an LLM application and variants in agenta.


### Architectural Overview:
In this scenario, evaluations are executed on the Agenta backend. Specifically, Agenta invokes the LLM application for each row in the test set and subsequently processes the output using the designated evaluator.
This operation is managed through Celery tasks. The interactions with the LLM application are asynchronous, batched, and include retry mechanisms. Additionally, the batching configuration can be adjusted to avoid exceeding the rate limits imposed by the LLM provider.


## Setup


```python
! pip install -U agenta
```

## Configuration Setup



```python
# Assuming an application has already been created through the user interface, you will need to obtain the application ID.
# In this example we will use the default template single_prompt which has the prompt "Determine the capital of {country}"

# You can find the application ID in the URL. For example, in the URL https://cloud.agenta.ai/apps/666dde95962bbaffdb0072b5/playground?variant=app.default, the application ID is `666dde95962bbaffdb0072b5`.
from agenta.client.backend.client import AgentaApi
# Let's list the applications
client.apps.list_apps()
```


```python

app_id = "667d8cfad1812781f7e375d9"

# You can create the API key under the settings page. If you are using the OSS version, you should keep this as an empty string
api_key = "EUqJGOUu.xxxx"

# Host.
host = "https://cloud.agenta.ai"

# Initialize the client

client = AgentaApi(base_url=host + "/api", api_key=api_key)
```

## Create a test set


```python
from agenta.client.backend.types.new_testset import NewTestset

csvdata = [
{"country": "france", "capital": "Paris"},
{"country": "Germany", "capital": "paris"}
]

response = client.testsets.create_testset(app_id=app_id, request=NewTestset(name="test set", csvdata=csvdata))
test_set_id = response.id

# let's now update it

csvdata = [
{"country": "france", "capital": "Paris"},
{"country": "Germany", "capital": "Berlin"}
]

client.testsets.update_testset(testset_id=test_set_id, request=NewTestset(name="test set", csvdata=csvdata))
```

# Create evaluators


```python
# Create an evaluator that performs an exact match comparison on the 'capital' column
# You can find the list of evaluator keys and evaluators and their configurations in https://github.com/Agenta-AI/agenta/blob/main/agenta-backend/agenta_backend/resources/evaluators/evaluators.py
response = client.evaluators.create_new_evaluator_config(app_id=app_id, name="capital_evaluator", evaluator_key="auto_exact_match", settings_values={"correct_answer_key": "capital"})
exact_match_eval_id = response.id

code_snippet = """
from typing import Dict
def evaluate(
app_params: Dict[str, str],
inputs: Dict[str, str],
output: str, # output of the llm app
datapoint: Dict[str, str] # contains the testset row
) -> float:
if output and output[0].isupper():
return 1.0
else:
return 0.0
"""

response = client.evaluators.create_new_evaluator_config(app_id=app_id, name="capital_letter_evaluator", evaluator_key="auto_custom_code_run", settings_values={"code": code_snippet})
letter_match_eval_id = response.id
```


```python
# get list of all evaluators
client.evaluators.get_evaluator_configs(app_id=app_id)
```

# Run an evaluation


```python
response = client.apps.list_app_variants(app_id=app_id)
print(response)
myvariant_id = response[0].variant_id
```


```python
# Run an evaluation
from agenta.client.backend.types.llm_run_rate_limit import LlmRunRateLimit
response = client.evaluations.create_evaluation(app_id=app_id, variant_ids=[myvariant_id], testset_id=test_set_id, evaluators_configs=[exact_match_eval_id, letter_match_eval_id],
rate_limit=LlmRunRateLimit(
batch_size=10, # number of rows to call in parallel
max_retries=3, # max number of time to retry a failed llm call
retry_delay=2, # delay before retrying a failed llm call
delay_between_batches=5, # delay between batches
),)
print(response)
```


```python
# check the status
client.evaluations.fetch_evaluation_status('667d98fbd1812781f7e3761a')
```


```python
# fetch the overall results
response = client.evaluations.fetch_evaluation_results('667d98fbd1812781f7e3761a')

results = [(evaluator["evaluator_config"]["name"], evaluator["result"]) for evaluator in response["results"]]
# End of Selection
```


```python
# fetch the detailed results
client.evaluations.fetch_evaluation_scenarios(evaluations_ids='667d98fbd1812781f7e3761a')
```
107 changes: 0 additions & 107 deletions website/docs/guides/evaluation_from_sdk.mdx

This file was deleted.

Loading

0 comments on commit 31a9610

Please sign in to comment.