Skip to content

Commit

Permalink
Merge pull request #2112 from Agenta-AI/mmabrouk/docs/various
Browse files Browse the repository at this point in the history
docs(app): Prompt mangement + Restructuring + Miscs
  • Loading branch information
mmabrouk authored Oct 10, 2024
2 parents 666c57f + 6994545 commit d6473bd
Show file tree
Hide file tree
Showing 48 changed files with 523 additions and 363 deletions.
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,4 @@ By inserting to these formatting conventions, you'll maintain the integrity and
## Notes

- Do not update any libraries or packages as this could disrupt the template structure and cause it to break.
- Please use kebab-case (this-way) instead of snake_case for naming files and folders
62 changes: 62 additions & 0 deletions docs/docs/concepts/01-concepts.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
title: "Core Concepts"
---

Below are the description to the main terms and concepts used in Agenta.

<img
style={{ display: "block", margin: "0 auto" }}
src="/images/prompt_management/taxonomy-concepts.png"
alt="Taxonomy of concepts in Agenta"
loading="lazy"
/>

### Templates

**Templates** are the workflows used by LLM-powered applications. Agenta comes with two default templates:

- **Completion Application Template:** For single-prompt applications that generate text completions.
- **Chat Application Template:** For applications that handle conversational interactions.

Agenta also allows you to create custom templates for your workflows using our SDK. Examples include:

- Retrieval-Augmented Generation (RAG) Applications
- Chains of Multiple Prompts
- Agents Interacting with External APIs

After creating a template, you can interact with it in the playground, run no-code evaluations, and deploy versions all from the web UI.

### Applications

An **application** uses a **template** to solve a specific use case. For instance, an **application** could use the single-prompt **template** for tasks like:

- **Tweet Generation:** Crafting engaging tweets based on input topics.
- **Article Summarization:** Condensing long articles into key points.

### Variants

Within each application, you can create **variants**. **Variants** are different configurations of the application, allowing you to experiment with and compare multiple approaches. For example, for the "tweet generation" application, you might create **variants** that:

- Use different prompt phrasings.
- Adjust model parameters like temperature or maximum tokens.
- Incorporate different styles or tones (e.g., professional vs. casual).

### Versions

Every **variant** is **versioned** and each **version** is immutable. When you make changes to a **variant**, a new **version** is created. Each **version** has a **commit id** that uniquely identifies it.

### Environments

**Environments** are the interfaces where your deployed variants are accessible. You can deploy a **version** of a **variant** to an **environment**. Each **environment** has a user-defined environment name (e.g. development, staging, production) that specifies its context or stage.

You can then integrate the **environment** into your codebase to fetch the configuration deployed on that **environment**. Additionally, you can directly invoke the endpoints relating to the **environment** containing the application running with that configuration.

By default, applications come with three predefined **environments**:

- **Development:** For initial testing and experimentation.
- **Staging:** For pre-production testing and quality assurance.
- **Production:** For live use with real users.

:::warning
When deploying a **variant** to an **environment**, the latest **version** of that **variant** gets deployed. Each **environment** points to a specific **version** of a **variant** (a certain **commit**). Updating the **variant** after deploying does not automatically update the **environment**.
:::
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ While some tools exist that help doing the first point via a user interface, the

## How does Agenta solve this problem?

Agenta creates a playground in the UI from your LLM applications, regardless of the workflow (RAG, chain-of-prompts, custom logic) or the framework (Langchain, Llama_index, OpenAI calls) in use.
Agenta creates a playground in the web UI from your LLM applications, regardless of the workflow (RAG, chain-of-prompts, custom logic) or the framework (Langchain, Llama_index, OpenAI calls) in use.

This enables the entire team to collaborate on prompt engineering and experimentation with the application parameters (prompts, models, chunk size, etc.). It also allow them to manage all aspects of the app development lifecyclefrom the UI: comparing different configuration, evaluating the application, deploying it, and more.

Expand All @@ -30,9 +30,10 @@ Agenta separates the application logic from the configuration. The application l

## Agenta architecture

<img width="700" src="/images/apps_and_configurations_light.png" />
Agenta decouples the configuration (prompts, model) from the application logic. The configuration is managed by the backend.
The configuration then can be modified both from the UI (in the playground) or from the CLI
<img src="/images/apps_and_configurations_light.png" />
Agenta decouples the configuration (prompts, model) from the application logic. The
configuration is managed by the backend. The configuration then can be modified both
from the UI (in the playground) or from the CLI

### The Application

Expand Down
6 changes: 3 additions & 3 deletions docs/docs/evaluation/01-overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ The key to building production-ready LLM applications is to have a tight feedbac
<DocCard
item={{
type: "link",
href: "/evaluation/overview",
label: "Run Evaluations from the UI",
href: "/evaluation/no-code-evaluation",
label: "Run Evaluations from the web UI",
description: "Learn about the evaluation process in Agenta",
}}
/>
Expand All @@ -58,7 +58,7 @@ The key to building production-ready LLM applications is to have a tight feedbac
<DocCard
item={{
type: "link",
href: "/evaluation/overview",
href: "/evaluation/sdk-evaluation",
label: "Run Evaluations with the SDK",
description: "Learn about the evaluation process in Agenta",
}}
Expand Down
13 changes: 10 additions & 3 deletions docs/docs/evaluation/03-configure-evaluators.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,11 @@ Evaluators typically take as input:

Evaluators return either a float or a boolean value.

<img style={{ width: "70%" }} src="/images/evaluation/evaluators.png" />
<img
src="/images/evaluation/evaluators-inout.png"
alt="Figure showing the inputs and outputs of an evaluator."
loading="lazy"
/>

### Configuring evaluators

Expand Down Expand Up @@ -70,8 +74,11 @@ Evaluators need to know which parts of the data contain the output and the refer

For more sophisticated evaluators, such as `RAG evaluators` (_available only in cloud and enterprise versions_), you need to define more complex mappings (see figure below).

![Figure showing how RAGAS faithfulness evaluator maps to an example LLM
generation.](/images/evaluation/evaluator_config_mapping.png)
<img
src="/images/evaluation/evaluator-mapping.png"
alt="Figure showing how RAGAS faithfulness evaluator is configured in agenta."
loading="lazy"
/>

Configuring the evaluator is done by mapping the evaluator inputs to the generation data:

Expand Down
23 changes: 19 additions & 4 deletions docs/docs/evaluation/04-no-code-evaluation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,10 @@ Before you get started, make sure that you have [created a test set](/evaluation

To start an evaluation, navigate to the Evaluations page and click the `Start new evaluation` button. A modal will appear, allowing you to setup the evaluation.

<img src="/images/evaluation/start-new-evaluation.png" />
<img
src="/images/evaluation/start-new-evaluation.png"
alt="Start new evaluation"
/>

### Setting Up Evaluation Parameters

Expand All @@ -21,7 +24,11 @@ In the modal, specify the following:
- <b>Variants:</b> Choose one or more variants to evaluate.
- <b>Evaluators:</b> Pick one or more evaluators for assessment.

<img src="/images/evaluation/new-evaluation-modal.png" />
<img
src="/images/evaluation/new-evaluation-modal.png"
alt="New evaluation modal"
style={{ width: "70%", display: "block", margin: "auto" }}
/>

#### Advanced Configuration

Expand All @@ -40,12 +47,20 @@ The main view offers an aggregated summary of results. Each column displays the

For a detailed view of an evaluation, click on a completed evaluation row.

<img src="/images/evaluation/detailed-evaluation-results.png" />
<img
src="/images/evaluation/detailed-evaluation-results.png"
alt="Detailed evaluation results"
style={{ width: "100%" }}
/>

The evaluation table columns show inputs, reference answers used by evaluators, LLM application output, evaluator results, cost, and latency.

## Comparing Evaluations

Once evaluations are marked "completed," you can compare two or more evaluations <b>from the same test set</b>. Click the `Compare` button to access the Evaluation comparison view, where you can analyze outputs from multiple evaluations side by side.

<img src="/images/evaluation/comparing-evaluations.gif" />
<img
src="/images/evaluation/comparing-evaluations.gif"
style={{ width: "100%" }}
alt="Animation showing how to compare evaluations in Agenta"
/>
8 changes: 6 additions & 2 deletions docs/docs/evaluation/05-sdk-evaluation.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Evaluation from SDK"
title: "Evaluate from SDK"
description: "Run evaluation programmatically from the SDK."
---

Expand All @@ -20,7 +20,11 @@ In **agenta**, evaluation is a **fully managed service** . It takes place entire

Our evaluation service takes a set of test sets, evaluators, and app variants and runs asynchronous jobs for evaluation.

<img src="/images/evaluation/evaluation-sdk-fig.png" />
<img
src="/images/evaluation/evaluate-sdk.png"
alt="Figure showing how LLM app evaluation infrastructure in Agenta."
loading="lazy"
/>

:::info
You can open this guide in a [jupyter notebook](https://github.com/Agenta-AI/agenta/blob/main/cookbook/evaluations_with_sdk.ipynb)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,15 @@ id: introduction
sidebar_position: 0
---

<img className="dark:hidden" src="images/agenta_mockup_whitebg.png" />
<img className="hidden dark:block" src="/images/agenta_mockup_blackbg.png" />
Agenta is an open-source platform that helps **developers** and **product teams** build robust AI applications powered by LLMs. It offers all the tools for **prompt management and evaluation**.
<img
style={{ display: "block", margin: "0 auto" }}
src="/images/agenta-cover.png"
alt="Screenshots of Agenta LLMOPS platform"
loading="lazy"
/>
Agenta is an open-source platform that helps **developers** and **product teams**
build robust AI applications powered by LLMs. It offers all the tools for **prompt
management and evaluation**.

### With Agenta, you can:

Expand Down
File renamed without changes.
34 changes: 10 additions & 24 deletions docs/docs/guides/cookbooks/evaluations_with_sdk.mdx
Original file line number Diff line number Diff line change
@@ -1,15 +1,14 @@
---
title: "Evaluations with sdk"
title: "Evaluations with SDK"
---



:::note
This guide is also available as a [Jupyter Notebook](https://github.com/Agenta-AI/agenta/blob/main/cookbook/evaluations_with_sdk.ipynb).
This guide is also available as a [Jupyter Notebook](https://github.com/Agenta-AI/agenta/blob/main/cookbook/evaluations_with_sdk.ipynb).
:::

# Using evaluations with the SDK
In this cookbook we will show how to interact with evaluation in agenta programatically. Either using the SDK (or the raw API).

In this cookbook we will show how to interact with evaluation in agenta programatically. Either using the SDK (or the raw API).

We will do the following:

Expand All @@ -18,25 +17,21 @@ We will do the following:
- Run an evaluation
- Retrieve the results of evaluations

We assume that you have already created an LLM application and variants in agenta.

We assume that you have already created an LLM application and variants in agenta.

### Architectural Overview:
In this scenario, evaluations are executed on the Agenta backend. Specifically, Agenta invokes the LLM application for each row in the test set and subsequently processes the output using the designated evaluator.
This operation is managed through Celery tasks. The interactions with the LLM application are asynchronous, batched, and include retry mechanisms. Additionally, the batching configuration can be adjusted to avoid exceeding the rate limits imposed by the LLM provider.

In this scenario, evaluations are executed on the Agenta backend. Specifically, Agenta invokes the LLM application for each row in the test set and subsequently processes the output using the designated evaluator.
This operation is managed through Celery tasks. The interactions with the LLM application are asynchronous, batched, and include retry mechanisms. Additionally, the batching configuration can be adjusted to avoid exceeding the rate limits imposed by the LLM provider.

## Setup

## Setup

```python
! pip install -U agenta
```

## Configuration Setup



```python
# Assuming an application has already been created through the user interface, you will need to obtain the application ID.
# In this example we will use the default template single_prompt which has the prompt "Determine the capital of {country}"
Expand All @@ -47,15 +42,14 @@ from agenta.client.backend.client import AgentaApi
client.apps.list_apps()
```


```python

app_id = "667d8cfad1812781f7e375d9"

# You can create the API key under the settings page. If you are using the OSS version, you should keep this as an empty string
api_key = "EUqJGOUu.xxxx"

# Host.
# Host.
host = "https://cloud.agenta.ai"

# Initialize the client
Expand All @@ -65,7 +59,6 @@ client = AgentaApi(base_url=host + "/api", api_key=api_key)

## Create a test set


```python
from agenta.client.backend.types.new_testset import NewTestset

Expand All @@ -89,7 +82,6 @@ client.testsets.update_testset(testset_id=test_set_id, request=NewTestset(name="

# Create evaluators


```python
# Create an evaluator that performs an exact match comparison on the 'capital' column
# You can find the list of evaluator keys and evaluators and their configurations in https://github.com/Agenta-AI/agenta/blob/main/agenta-backend/agenta_backend/resources/evaluators/evaluators.py
Expand All @@ -103,7 +95,7 @@ def evaluate(
app_params: Dict[str, str],
inputs: Dict[str, str],
output: str, # output of the llm app
datapoint: Dict[str, str] # contains the testset row
datapoint: Dict[str, str] # contains the testset row
) -> float:
if output and output[0].isupper():
return 1.0
Expand All @@ -115,22 +107,19 @@ response = client.evaluators.create_new_evaluator_config(app_id=app_id, name="ca
letter_match_eval_id = response.id
```


```python
# get list of all evaluators
client.evaluators.get_evaluator_configs(app_id=app_id)
```

# Run an evaluation


```python
response = client.apps.list_app_variants(app_id=app_id)
print(response)
myvariant_id = response[0].variant_id
```


```python
# Run an evaluation
from agenta.client.backend.types.llm_run_rate_limit import LlmRunRateLimit
Expand All @@ -144,13 +133,11 @@ response = client.evaluations.create_evaluation(app_id=app_id, variant_ids=[myva
print(response)
```


```python
# check the status
client.evaluations.fetch_evaluation_status('667d98fbd1812781f7e3761a')
```


```python
# fetch the overall results
response = client.evaluations.fetch_evaluation_results('667d98fbd1812781f7e3761a')
Expand All @@ -159,7 +146,6 @@ results = [(evaluator["evaluator_config"]["name"], evaluator["result"]) for eval
# End of Selection
```


```python
# fetch the detailed results
client.evaluations.fetch_evaluation_scenarios(evaluations_ids='667d98fbd1812781f7e3761a')
Expand Down
Loading

0 comments on commit d6473bd

Please sign in to comment.