diff --git a/docs/docs/tutorials/cookbooks/RAG-QA-docs.mdx b/docs/docs/tutorials/cookbooks/RAG-QA-docs.mdx
new file mode 100644
index 0000000000..93207d2e6d
--- /dev/null
+++ b/docs/docs/tutorials/cookbooks/RAG-QA-docs.mdx
@@ -0,0 +1,582 @@
+---
+title: "RAG Q&A over Documentation"
+description: "Build a Q&A system for your documentation using RAG with Agenta, Litellm and Qdrant. Evaluate it using Ragas Context relevancy and LLM-as-a-judge. Deploy it as an API endpoint."
+---
+
+:::info Open in Github
+The code for this tutorial is available [here](https://github.com/Agenta-AI/agenta/tree/main/examples/custom_workflows/rag-docs-qa).
+:::
+
+```mdx-code-block
+import Image from "@theme/IdealImage";
+```
+
+In this tutorial, we'll build a Q&A system for our documentation using RAG (Retrieval-Augmented Generation). Our AI assistant will answer user queries by retrieving relevant sections from our documentation and using them as context when calling an LLM.
+
+At the end, we will have:
+
+- A **playground** for testing different embeddings, adjusting top_k values (number of context chunks to include), and experimenting with various prompts and models
+- **LLM-as-a-judge** and **RAG context relevancy** evaluations for our Q&A application
+- **Observability** with Agenta to debug and monitor our application
+- A **deployment** that we can either [directly invoke](/prompt-management/integration/proxy-calls) **or** [fetch the configuration](/reference/sdk/configuration-management#get_from_registry) to run elsewhere
+
+You can try our playground by creating a free account at [https://cloud.agenta.ai](https://cloud.agenta.ai) and opening the demo.
+
+<Image
+  style={{ display: "block", margin: "10px auto" }}
+  img={require("/images/cookbooks/rag-qa-playground.png")}
+  alt="Playground for testing the RAG "
+  loading="lazy"
+/>
+
+## Our stack
+
+- **Agenta** for playground, evaluation, observability, and deployment.
+- **[LiteLLM](https://github.com/BerriAI/litellm)** for interacting with language models and embeddings.
+- **[Qdrant](https://qdrant.tech/)** as our vector database for storing and querying document embeddings.
+
+## Ingestion pipeline
+
+The first step is to process our documentation and store it in a vector database for retrieval. Let's start by looking at how we ingest our documentation into Qdrant.
+
+```python title="ingest.py"
+
+OPENAI_EMBEDDING_DIM = 1536  # For text-embedding-ada-002
+COHERE_EMBEDDING_DIM = 1024  # For embed-english-v3.0
+
+qdrant_client = QdrantClient(
+    url=os.getenv("QDRANT_URL"), api_key=os.getenv("QDRANT_API_KEY")
+)
+
+
+def chunk_text(text: str, max_chunk_size: int = 1500) -> List[str]:
+    """
+    Split text into chunks based on paragraphs and size.
+    Tries to maintain context by keeping paragraphs together when possible.
+    """
+    # Split by double newlines to preserve paragraph structure
+    paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]
+
+    chunks = []
+    current_chunk = []
+    current_size = 0
+
+    for paragraph in paragraphs:
+        paragraph_size = len(paragraph)
+
+        # If a single paragraph is too large, split it by sentences
+        if paragraph_size > max_chunk_size:
+            sentences = [s.strip() + "." for s in paragraph.split(".") if s.strip()]
+            for sentence in sentences:
+                if len(sentence) > max_chunk_size:
+                    # If even a sentence is too long, split it by chunks
+                    for i in range(0, len(sentence), max_chunk_size):
+                        chunks.append(sentence[i : i + max_chunk_size])
+                elif current_size + len(sentence) > max_chunk_size:
+                    # Start new chunk
+                    chunks.append(" ".join(current_chunk))
+                    current_chunk = [sentence]
+                    current_size = len(sentence)
+                else:
+                    current_chunk.append(sentence)
+                    current_size += len(sentence)
+        # If adding this paragraph would exceed the limit, start a new chunk
+        elif current_size + paragraph_size > max_chunk_size:
+            chunks.append(" ".join(current_chunk))
+            current_chunk = [paragraph]
+            current_size = paragraph_size
+        else:
+            current_chunk.append(paragraph)
+            current_size += paragraph_size
+
+    # Add the last chunk if it exists
+    if current_chunk:
+        chunks.append(" ".join(current_chunk))
+
+    return chunks
+
+
+def process_doc(file_path: str, docs_path: str, docs_base_url: str) -> List[Dict]:
+    """Process a single document into chunks with metadata."""
+    with open(file_path, "r", encoding="utf-8") as f:
+        # Parse frontmatter and content
+        post = frontmatter.load(f)
+        content = post.content
+
+        # Calculate document hash
+        doc_hash = calculate_doc_hash(content)
+
+        # Get document URL
+        doc_url = get_doc_url(file_path, docs_path, docs_base_url)
+
+        # Create base metadata
+        metadata = {
+            "title": post.get("title", ""),
+            "url": doc_url,
+            "file_path": file_path,
+            "last_updated": datetime.utcnow().isoformat(),
+            "doc_hash": doc_hash,
+        }
+
+        # Chunk the content
+        chunks = chunk_text(content)
+
+        return [
+            {"content": chunk, "metadata": metadata, "doc_hash": doc_hash}
+            for chunk in chunks
+        ]
+
+
+def get_embeddings(text: str) -> Dict[str, List[float]]:
+    """Get embeddings using both OpenAI and Cohere models via LiteLLM."""
+    # Get OpenAI embedding
+    openai_response = embedding(model="text-embedding-ada-002", input=[text])
+    openai_embedding = openai_response["data"][0]["embedding"]
+
+    # Get Cohere embedding
+    cohere_response = embedding(
+        model="cohere/embed-english-v3.0",
+        input=[text],
+        input_type="search_document",  # Specific to Cohere v3 models
+    )
+    cohere_embedding = cohere_response["data"][0]["embedding"]
+
+    return {"openai": openai_embedding, "cohere": cohere_embedding}
+
+
+def setup_qdrant_collection():
+    """Create or recreate the vector collection."""
+    # Delete if exists
+    try:
+        qdrant_client.delete_collection(COLLECTION_NAME)
+    except Exception:
+        pass
+
+    # Create collection with two vector types
+    qdrant_client.create_collection(
+        collection_name=COLLECTION_NAME,
+        vectors_config={
+            "openai": models.VectorParams(
+                size=OPENAI_EMBEDDING_DIM, distance=models.Distance.COSINE
+            ),
+            "cohere": models.VectorParams(
+                size=COHERE_EMBEDDING_DIM, distance=models.Distance.COSINE
+            ),
+        },
+    )
+
+
+def upsert_chunks(chunks: List[Dict]):
+    """Upsert document chunks to the vector store."""
+    for i, chunk in enumerate(chunks):
+        # Get both embeddings using LiteLLM
+        embeddings = get_embeddings(chunk["content"])
+
+        # Create payload
+        payload = {**chunk["metadata"], "content": chunk["content"], "chunk_index": i}
+
+        # Upsert to Qdrant
+        qdrant_client.upsert(
+            collection_name=COLLECTION_NAME,
+            points=[
+                models.PointStruct(
+                    id=f"{chunk['doc_hash']}",
+                    payload=payload,
+                    vector=embeddings,  # Contains both 'openai' and 'cohere' embeddings
+                )
+            ],
+        )
+
+
+def main():
+    # Get environment variables
+    docs_path = os.getenv("DOCS_PATH")
+    docs_base_url = os.getenv("DOCS_BASE_URL")
+
+    if not docs_path or not docs_base_url:
+        raise ValueError("DOCS_PATH and DOCS_BASE_URL must be set in .env file")
+
+    # Create fresh collection
+    setup_qdrant_collection()
+
+    # Process all documents
+    all_docs = get_all_docs(docs_path)
+    for doc_path in tqdm.tqdm(all_docs):
+        print(f"Processing {doc_path}")
+        chunks = process_doc(doc_path, docs_path, docs_base_url)
+        upsert_chunks(chunks)
+```
+
+This script performs the following steps:
+
+1. **Loads documentation files:** Reads all `.mdx` files from the documentation directory.
+2. **Processes documents:** Chunks the text, adds metadata (e.g. the url where the page where to be found)..
+3. **Generates embeddings:** Generate embeddings for each chunk using both OpenAI and Cohere models. We use both because we would like to compare them in the playground.
+4. **Stores embeddings in Qdrant:** Upserts the embeddings into a Qdrant collection for later retrieval. We use named vectors to save multiple embeddings for the same document.
+
+To run the ingestion pipeline, you need first to create a collection in Qdrant and then set the following environment variables:
+
+- `QDRANT_URL`: The URL of your Qdrant instance.
+- `QDRANT_API_KEY`: The API key for your Qdrant instance.
+- `DOCS_PATH`: The folder containing the documentation (in our case it's under `agenta/docs/docs`).
+- `DOCS_BASE_URL`: The base URL where the documentation can be found (in our case it's `https://docs.agenta.ai`).
+
+:::info
+The complete ingestion script with a setup readme is [available in Github](https://github.com/Agenta-AI/agenta/tree/main/examples/custom_workflows/rag-docs-qa).
+:::
+
+## The query RAG workflow
+
+Now that we have ingested the documentation into the Qdrant vector database, let's create the query logic for our assistant. Parts related to the Agenta integrations are highlighted.
+
+```python title="query.py"
+#highlight-start
+import agenta as ag
+from pydantic import BaseModel, Field
+from typing import Annotated
+from agenta.sdk.assets import supported_llm_models
+#highlight-end
+
+system_prompt = """
+You are a helpful assistant that answers questions based on the documentation.
+"""
+user_prompt = """
+Here is the query: {query}
+
+Here is the context: {context}
+"""
+#highlight-start
+ag.init()
+#highlight-end
+
+#highlight-start
+litellm.callbacks = [ag.callbacks.litellm_handler()]
+#highlight-end
+
+# Initialize Qdrant client
+qdrant_client = QdrantClient(
+    url=os.getenv("QDRANT_URL"), api_key=os.getenv("QDRANT_API_KEY")
+)
+
+#highlight-start
+# We define here the configuration that will be used by the playground
+class Config(BaseModel):
+    system_prompt: str = Field(default=system_prompt)
+    user_prompt: str = Field(default=user_prompt)
+    embedding_model: Annotated[str, ag.MultipleChoice(["openai", "cohere"])] = Field(
+        default="openai"
+    )
+    llm_model: Annotated[str, ag.MultipleChoice(choices=supported_llm_models)] = Field(
+        default="gpt-3.5-turbo"
+    )
+    top_k: int = Field(default=10, ge=1, le=25)
+    rerank_top_k: int = Field(default=3, ge=1, le=10)
+    use_rerank: bool = Field(default=True)
+#highlight-end
+
+
+def get_embeddings(text: str, model: str) -> Dict[str, List[float]]:
+    """Get embeddings using both OpenAI and Cohere models via LiteLLM."""
+    if model == "openai":
+        return embedding(model="text-embedding-ada-002", input=[text])["data"][0]["embedding"]
+    elif model == "cohere":
+        return embedding(
+            model="cohere/embed-english-v3.0",
+            input=[text],
+            input_type="search_query",  # Use search_query for queries
+        )["data"][0]["embedding"]
+
+    raise ValueError(f"Unknown model: {model}")
+
+#highlight-next-line
+@ag.instrument()
+def search_docs(
+    query: str, collection_name: str = os.getenv("COLLECTION_NAME", "docs_collection")
+) -> List[Dict]:
+    """
+    Search the documentation using embeddings.
+
+    Args:
+        query: The search query
+        collection_name: Name of the Qdrant collection to search
+
+    Returns:
+        List of dictionaries containing matched documents and their metadata
+    """
+
+    #highlight-start
+    # Get embeddings for the query
+    config = ag.ConfigManager.get_from_route(Config)
+    #highlight-end
+    # Search using embeddings
+    results = qdrant_client.query_points(
+        collection_name=collection_name,
+        query=get_embeddings(query, config.embedding_model),
+        using=config.embedding_model,
+        limit=config.top_k,
+    )
+    # Format results
+    formatted_results = []
+    for result in results.points:
+        formatted_result = {
+            "content": result.payload["content"],
+            "metadata": {
+                "title": result.payload["title"],
+                "url": result.payload["url"],
+                "score": result.score,
+            },
+        }
+        formatted_results.append(formatted_result)
+
+    return formatted_results
+
+#highlight-next-line
+@ag.instrument()
+def llm(query: str, results: List[Dict]):
+    #highlight-next-line
+    config = ag.ConfigManager.get_from_route(Config)
+    context = []
+    for i, result in enumerate(results, 1):
+        score = result["metadata"].get("rerank_score", result["metadata"]["score"])
+        item = f"Result {i} (Score: {score:.3f})\n"
+        item += f"Title: {result['metadata']['title']}\n"
+        item += f"URL: {result['metadata']['url']}\n"
+        item += f"Content: {result['content']}\n"
+        item += "-" * 80 + "\n"
+        context.append(item)
+    #highlight-start
+    # We store the context in the trace so that it can be used for evaluation
+    ag.tracing.store_internals({"context": context})
+    #highlight-end
+    response = completion(
+        model=config.llm_model,
+        messages=[
+            {"role": "system", "content": config.system_prompt},
+            {
+                "role": "user",
+                "content": config.user_prompt.format(
+                    query=query, context="".join(context)
+                ),
+            },
+        ],
+    )
+    return response.choices[0].message.content
+
+#highlight-next-line
+@ag.instrument()
+def rerank_results(query: str, results: List[Dict]) -> List[Dict]:
+    """Rerank the search results using Cohere's reranker."""
+    #highlight-start
+    config = ag.ConfigManager.get_from_route(Config)
+    #highlight-end
+    # Format documents for reranking
+    documents = [result["content"] for result in results]
+
+    # Perform reranking
+    reranked = rerank(
+        model="cohere/rerank-english-v3.0",
+        query=query,
+        documents=documents,
+        top_n=config.rerank_top_k,
+    )
+    # Reorder the original results based on reranking
+    reranked_results = []
+    for item in reranked.results:
+        # The rerank function returns dictionaries with 'document' and 'index' keys
+        reranked_results.append(results[item["index"]])
+        # Add rerank score to metadata
+        reranked_results[-1]["metadata"]["rerank_score"] = item["relevance_score"]
+
+    return reranked_results
+
+#highlight-start
+# We define here the route that will be used by the playground
+@ag.route("/", config_schema=Config)
+@ag.instrument()
+#highlight-end
+def generate(query: str):
+    #highlight-next-line
+    config = ag.ConfigManager.get_from_route(Config)
+    results = search_docs(query)
+    if config.use_rerank:
+        reranked_results = rerank_results(query, results)
+        return llm(query, reranked_results)
+    else:
+        return llm(query, results)
+```
+
+Our system uses a standard RAG workflow consisting of three main steps:
+
+1. **Searching the documentation:** Uses the query to retrieve relevant documents from Qdrant.
+2. **Optionally reranking results:** Improves the relevance of results using Cohere's reranker.
+3. **Generating the answer:** Constructs a prompt with the query and context, then calls the LLM to generate the final answer.
+
+To integrate this script with Agenta, we need to make two main adjustments:
+
+1. **Instrumentation:** Use `@ag.instrument()` decorator to trace inputs, outputs, and internal variables.
+2. **Integration with the Playground:** Use `ag.route()` to define a route and later create a service that will be used to test the app in the playground.
+
+We'll discuss these in more detail in the next sections.
+
+## Instrumentation
+
+Tracing captures the inputs and outputs of all functions and LLM calls in our app. This helps us debug multi-step workflows (for example, determining whether an incorrect response stems from the LLM call or from incorrect context) and monitor usage over time.
+
+```python
+@ag.instrument()
+def generate(query: str):
+    ...
+```
+
+Instrumenting code in Agenta is straightforward. The `@ag.instrument()` decorator lets you capture function inputs and outputs to create a trace tree.
+
+Agenta also provides auto-instrumentation for most frameworks and libraries. Since we're using litellm, we'll use Agenta's callback function to automatically instrument its calls.
+
+For RAG evaluation of our applications, we need to evaluate the relevancy of retrieved context for each query. Since context isn't part of any function's input or output, we'll add it manually to a span using `ag.tracing.store_internals({"context": context})`, which stores internal variables in the ongoing span.
+
+<Image
+  style={{ display: "block", margin: "10px auto" }}
+  img={require("/images/cookbooks/rag-qa-tracing.png")}
+  alt="Trace view of the RAG Q&A assistant"
+  loading="lazy"
+/>
+
+## Playground integration
+
+Agenta provides a custom playground for testing application parameters. Here, we can experiment with different embeddings, top_k values, and LLM models.
+
+Using the Agenta SDK, we'll define a configuration schema for our application and create an endpoint to enable playground communication. Then, we'll deploy the application to Agenta Cloud using the Agenta CLI for testing. Agenta handles all infrastructure work needed to create our application service.
+
+### Defining the configuration
+
+Let's define the configuration schema for our application. This schema will determine what elements appear in the playground UI and what parameters we can experiment with.
+
+Our configuration includes:
+
+- **System prompt:** The system prompt template
+- **User prompt:** The user prompt template
+- **Embedding model:** Choice between OpenAI and Cohere
+- **LLM model:** Selection from supported language models
+- **Top_k value:** Number of document chunks to retrieve from the vector database
+- **Use rerank:** Toggle for Cohere's reranking feature
+- **Rerank top_k value:** Number of chunks the reranker should return (used for both reordering and filtering)
+
+```python
+from pydantic import BaseModel, Field
+from typing import Annotated
+import agenta as ag
+from agenta.sdk.assets import supported_llm_models
+
+class Config(BaseModel):
+    system_prompt: str = Field(default=system_prompt)
+    user_prompt: str = Field(default=user_prompt)
+    embedding_model: Annotated[str, ag.MultipleChoice(["openai", "cohere"])] = Field(
+        default="openai"
+    )
+    llm_model: Annotated[str, ag.MultipleChoice(choices=supported_llm_models)] = Field(
+        default="gpt-3.5-turbo"
+    )
+    top_k: int = Field(default=10, ge=1, le=25)
+    rerank_top_k: int = Field(default=3, ge=1, le=10)
+    use_rerank: bool = Field(default=True)
+```
+
+We implement this using a standard `Config` Pydantic class that inherits from BaseModel. The fields use simple types (str or int). Agenta requires each field to have a default value. For multiple-choice fields, we use `Annotated[str, ag.MultipleChoice(choices=["choice1", "choice2"])]` to specify the available options.
+
+:::info
+`supported_llm_models` is a helper variable provided by Agenta that contains the list available in LiteLLM.
+:::
+
+### Creating the endpoint and using the configuration
+
+Next, we'll create an endpoint to enable communication between the playground and our application.
+
+```python
+@ag.route("/", config_schema=Config)
+def generate(query: str):
+    config = ag.ConfigManager.get_from_route(Config)
+    ...
+```
+
+[The decorator `@ag.route("/", config_schema=Config)`](https://www.notion.so/reference/sdk/custom-workflow#agroute-decorator) registers the `generate` function as an endpoint and uses the `Config` class to define the configuration schema. This creates a `POST /playground/run` endpoint that accepts the configuration as a parameter and runs the workflow. The playground uses this endpoint to interact with the service.
+
+To get the configuration from the request, we use `ag.ConfigManager.get_from_route(Config)`, which returns a Config object containing the values provided by the playground.
+
+We can use these configuration values throughout our workflow. For instance, we can use `config.use_rerank` in the `generate` function to control the reranking feature.
+
+Note that `ag.ConfigManager.get_from_route(Config)` is accessible in any function called within the generate function's execution path, as the configuration is preserved in the context.
+
+### Deploying the application to Agenta
+
+Now that we have everything ready to deploy our application to Agenta, let's proceed. First, add the `requirements.txt` file to the same folder as your project files and populate the `.env` file with your environment variables. Then run these commands:
+
+```bash
+agenta init
+
+agenta variant serve query.py
+```
+
+The first command creates a new application in Agenta, while the second command serves the application and creates a playground for testing.
+
+:::info
+Under the hood, `agenta variant serve` creates a docker image of your application and sets up a service for it in Agenta Cloud.
+:::
+
+Once complete, you can access the playground and begin testing your application.
+
+<Image
+  style={{ display: "block", margin: "10px auto" }}
+  img={require("/images/cookbooks/rag-qa-playground.png")}
+  alt="Playground for testing the RAG Q&A assistant"
+  loading="lazy"
+/>
+
+## Evaluating the assistant
+
+To ensure our assistant provides accurate and relevant answers, we'll use evaluators to assess its performance. We will create two evaluators:
+
+1. RAG Relevancy Evaluator: Measures how relevant the assistant's answers are with respect to the retrieved context.
+2. LLM-as-a-Judge Evaluator: Rates the quality of the assistant's responses.
+
+For the first, we use the RAG Relevancy evaluator as described in [Agenta's evaluation documentation](/evaluation/evaluators/rag-evaluators).
+
+**Configuration:**
+
+- **Question key:** `trace.generate.inputs.query`
+- **Answer key:** `trace.generate.outputs`
+- **Contexts key:** `trace.generate.llm.internals.context`
+
+This evaluator measures how relevant the assistant's answers are with respect to the retrieved context. Note that we use `trace.generate.llm.internals.context`, which we previously stored in the span, to get the context from the trace.
+
+You can use the evaluator playground to configure the evaluator and identify the correct trace data to use in your configuration (see image below).
+
+<Image
+  style={{ display: "block", margin: "10px auto" }}
+  img={require("/images/cookbooks/rag-qa-eval-config.png")}
+  alt="Configuration of the RAG Relevancy evaluator"
+  loading="lazy"
+/>
+
+We set and test an LLM-as-a-Judge evaluator to rate the quality of the assistant's responses the same way. More details on setting up LLM-as-a-Judge evaluators can be found [here](/evaluation/evaluators/llm-as-a-judge).
+
+## Deploying the assistant
+
+After iterating through various prompts and parameters and evaluating their performance, we can deploy our satisfied solution as an API endpoint using Agenta.
+
+Simply click the `Deploy` button in the playground to accomplish this.
+
+Agenta provides us with [two endpoints](/prompt-management/integration/how-to-integrate-with-agenta) to interact with our deployed application:
+
+- The first allows us to directly invoke the deployed application with the production configuration.
+- The second allows us to fetch the deployed configuration as a JSON and use it in our self-deployed application.
+
+## Conclusion
+
+In this tutorial, we built a documentation Q&A system using RAG, but more importantly, we created a comprehensive LLMOps workflow that includes:
+
+- A **playground** for testing different embeddings, prompts, and retrieval parameters in real time
+- **Observability tools** for debugging multi-step RAG workflows and monitoring production performance
+- **Evaluation pipelines** for assessing both RAG relevancy and response quality
+- **Deployment capabilities** for smoothly transitioning from experimentation to production
+
+This workflow shows how to evolve beyond a basic RAG implementation to build a production-ready system with robust testing, monitoring, and iteration capabilities.
diff --git a/docs/static/images/cookbooks/rag-qa-eval-config.png b/docs/static/images/cookbooks/rag-qa-eval-config.png
new file mode 100644
index 0000000000..4efbeb08cd
Binary files /dev/null and b/docs/static/images/cookbooks/rag-qa-eval-config.png differ
diff --git a/docs/static/images/cookbooks/rag-qa-playground.png b/docs/static/images/cookbooks/rag-qa-playground.png
new file mode 100644
index 0000000000..00a596b3b5
Binary files /dev/null and b/docs/static/images/cookbooks/rag-qa-playground.png differ
diff --git a/docs/static/images/cookbooks/rag-qa-tracing.png b/docs/static/images/cookbooks/rag-qa-tracing.png
new file mode 100644
index 0000000000..600166c747
Binary files /dev/null and b/docs/static/images/cookbooks/rag-qa-tracing.png differ
diff --git a/examples/custom_workflows/rag-docs-qa/.env.example b/examples/custom_workflows/rag-docs-qa/.env.example
new file mode 100644
index 0000000000..e6f3d6b076
--- /dev/null
+++ b/examples/custom_workflows/rag-docs-qa/.env.example
@@ -0,0 +1,13 @@
+DOCS_PATH=
+DOCS_BASE_URL=
+OPENAI_API_KEY=
+COHERE_API_KEY=
+COLLECTION_NAME=
+QDRANT_URL=
+QDRANT_API_KEY=
+
+# optional 
+MISTRAL_API_KEY=
+ANTHROPIC_API_KEY=
+GEMINI_API_KEY=
+GROQ_API_KEY=
\ No newline at end of file
diff --git a/examples/custom_workflows/rag-docs-qa/README.md b/examples/custom_workflows/rag-docs-qa/README.md
new file mode 100644
index 0000000000..e39fe44989
--- /dev/null
+++ b/examples/custom_workflows/rag-docs-qa/README.md
@@ -0,0 +1,62 @@
+# RAG Q&A Documentation System
+
+This project implements a RAG system for documentation Q&A. The documentation is expected to be in mdx format (we use for our tutorial our documentation using Docusaurus).
+
+The stack used:
+
+- Qdrant for vector database
+- Cohere for embedding
+- OpenAI for LLM and embedding
+
+## Requirements
+
+- Qdrant database set up
+- Cohere API key
+- OpenAI API key
+
+## Setup
+
+1. Set up virtual environment and install dependencies:
+
+```bash
+uv venv
+source .venv/bin/activate  # On Unix/macOS
+# or
+.venv\scripts\activate  # On Windows
+
+uv pip compile requirements.in --output-file requirements.txt
+
+uv pip sync requirements.txt
+```
+
+2. Copy `.env.example` to `.env` and fill in your configuration:
+
+```bash
+cp .env.example .env
+
+DOCS_PATH= The path to your documentation folder containing the mdx files
+DOCS_BASE_URL= This is the base url of your documentation site. This will be used to generate the links in the citations.
+OPENAI_API_KEY= Your OpenAI API key
+COHERE_API_KEY= Your Cohere API key
+COLLECTION_NAME= The name of the collection in Qdrant to store the embeddings
+QDRANT_URL= The url of your Qdrant server
+QDRANT_API_KEY= The API key of your Qdrant server
+AGENTA_API_KEY= Your Agenta API key
+```
+
+3. Run the ingestion script:
+
+```bash
+python ingest.py
+```
+
+4. Serve the application to Agenta:
+
+```bash
+agenta init
+agenta variant serve query.py
+```
+
+## Notes:
+
+- `generate_test_set.py` is used to generate a test set of questions based on the documentation for evaluation.
diff --git a/examples/custom_workflows/rag-docs-qa/generate_test_set.py b/examples/custom_workflows/rag-docs-qa/generate_test_set.py
new file mode 100644
index 0000000000..3210a8c244
--- /dev/null
+++ b/examples/custom_workflows/rag-docs-qa/generate_test_set.py
@@ -0,0 +1,99 @@
+import os
+import glob
+from pathlib import Path
+import pandas as pd
+from dotenv import load_dotenv
+from litellm import completion
+import frontmatter
+import tqdm
+import json
+
+# Load environment variables
+load_dotenv()
+
+
+def get_files(docs_path):
+    """Get all markdown files recursively."""
+    return
+
+
+def extract_content(file_path):
+    """Extract content from markdown file."""
+    with open(file_path, "r", encoding="utf-8") as f:
+        post = frontmatter.load(f)
+        # Get title from frontmatter or filename
+        title = post.get("title", Path(file_path).stem)
+        # Get content without frontmatter
+        content = post.content
+        return title, content
+
+
+def generate_questions(title, content):
+    """Generate questions using OpenAI."""
+    system_prompt = """You are a helpful assistant that generates questions based on documentation content.
+    Generate 5 questions that could be answered using the provided documentation.
+    Your response must be a JSON object with a single key "questions" containing an array of strings."""
+
+    user_prompt = f"""
+    Title: {title}
+    
+    Content: {content}  # Limit content length to avoid token limits
+    
+    Generate 5 questions about this documentation. Put yourself in the shoes of a user attempting to 1) figure how to use the product for a use case 2) troubleshoot an issue 3) learn about the features of the product. 
+    The user in this case is a technical user (AI engineer) who is trying to build an llm application.
+    The user would write the questions they would ask in a chat with a human. Therefore, not all questions will be clear and well written. 
+    """
+
+    try:
+        response = completion(
+            model="gpt-3.5-turbo-0125",  # Using the latest model that supports JSON mode
+            messages=[
+                {"role": "system", "content": system_prompt},
+                {"role": "user", "content": user_prompt},
+            ],
+            response_format={"type": "json_object"},
+        )
+
+        # Check if the response was complete
+        if response.choices[0].finish_reason == "length":
+            print(f"Warning: Response was truncated for {title}")
+            return []
+
+        # Parse JSON response - no need for eval()
+        result = json.loads(response.choices[0].message.content)
+        return result["questions"]
+
+    except Exception as e:
+        print(f"Error generating questions for {title}: {str(e)}")
+        return []
+
+
+def main():
+    docs_path = os.getenv("DOCS_PATH")
+    if not docs_path:
+        raise ValueError("DOCS_PATH environment variable not set")
+
+    # Get all files
+    files = glob.glob(os.path.join(docs_path, "**/*.mdx"), recursive=True)
+    all_questions = []
+    # Process each file
+    for file_path in tqdm.tqdm(files, desc="Processing documentation files"):
+        if "/reference/api" in file_path:
+            # skip api docs
+            continue
+        try:
+            title, content = extract_content(file_path)
+            questions = generate_questions(title, content)
+            all_questions.extend(questions)
+        except Exception as e:
+            print(f"Error processing {file_path}: {str(e)}")
+            continue
+
+    # Save to CSV
+    df = pd.DataFrame({"query": all_questions})
+    df.to_csv("test_set.csv", index=False, lineterminator="\n")
+    print(f"Generated {len(all_questions)} questions and saved to test_set.csv")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/custom_workflows/rag-docs-qa/ingest.py b/examples/custom_workflows/rag-docs-qa/ingest.py
new file mode 100644
index 0000000000..5df1c3dd49
--- /dev/null
+++ b/examples/custom_workflows/rag-docs-qa/ingest.py
@@ -0,0 +1,204 @@
+import os
+import glob
+from typing import List, Dict
+import hashlib
+from datetime import datetime
+import frontmatter
+from dotenv import load_dotenv
+from qdrant_client import QdrantClient
+from qdrant_client.http import models
+from litellm import embedding
+import tqdm
+
+# Load environment variables
+load_dotenv()
+
+# Constants
+OPENAI_EMBEDDING_DIM = 1536  # For text-embedding-ada-002
+COHERE_EMBEDDING_DIM = 1024  # For embed-english-v3.0
+COLLECTION_NAME = "docs_collection"
+
+# Initialize Qdrant client
+qdrant_client = QdrantClient(
+    url=os.getenv("QDRANT_URL"), api_key=os.getenv("QDRANT_API_KEY")
+)
+
+
+def get_all_docs(docs_path: str) -> List[str]:
+    """Get all MDX files in the docs directory."""
+    return glob.glob(os.path.join(docs_path, "**/*.mdx"), recursive=True)
+
+
+def calculate_doc_hash(content: str) -> str:
+    """Calculate a hash for the document content."""
+    return hashlib.md5(content.encode()).hexdigest()
+
+
+def get_doc_url(file_path: str, docs_path: str, docs_base_url: str) -> str:
+    """Convert file path to documentation URL."""
+    relative_path = os.path.relpath(file_path, docs_path)
+    # Remove .mdx extension and convert to URL path
+    url_path = os.path.splitext(relative_path)[0]
+    return f"{docs_base_url}/{url_path}"
+
+
+def chunk_text(text: str, max_chunk_size: int = 1500) -> List[str]:
+    """
+    Split text into chunks based on paragraphs and size.
+    Tries to maintain context by keeping paragraphs together when possible.
+    """
+    # Split by double newlines to preserve paragraph structure
+    paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]
+
+    chunks = []
+    current_chunk = []
+    current_size = 0
+
+    for paragraph in paragraphs:
+        paragraph_size = len(paragraph)
+
+        # If a single paragraph is too large, split it by sentences
+        if paragraph_size > max_chunk_size:
+            sentences = [s.strip() + "." for s in paragraph.split(".") if s.strip()]
+            for sentence in sentences:
+                if len(sentence) > max_chunk_size:
+                    # If even a sentence is too long, split it by chunks
+                    for i in range(0, len(sentence), max_chunk_size):
+                        chunks.append(sentence[i : i + max_chunk_size])
+                elif current_size + len(sentence) > max_chunk_size:
+                    # Start new chunk
+                    chunks.append(" ".join(current_chunk))
+                    current_chunk = [sentence]
+                    current_size = len(sentence)
+                else:
+                    current_chunk.append(sentence)
+                    current_size += len(sentence)
+        # If adding this paragraph would exceed the limit, start a new chunk
+        elif current_size + paragraph_size > max_chunk_size:
+            chunks.append(" ".join(current_chunk))
+            current_chunk = [paragraph]
+            current_size = paragraph_size
+        else:
+            current_chunk.append(paragraph)
+            current_size += paragraph_size
+
+    # Add the last chunk if it exists
+    if current_chunk:
+        chunks.append(" ".join(current_chunk))
+
+    return chunks
+
+
+def process_doc(file_path: str, docs_path: str, docs_base_url: str) -> List[Dict]:
+    """Process a single document into chunks with metadata."""
+    with open(file_path, "r", encoding="utf-8") as f:
+        # Parse frontmatter and content
+        post = frontmatter.load(f)
+        content = post.content
+
+        # Calculate document hash
+        doc_hash = calculate_doc_hash(content)
+
+        # Get document URL
+        doc_url = get_doc_url(file_path, docs_path, docs_base_url)
+
+        # Create base metadata
+        metadata = {
+            "title": post.get("title", ""),
+            "url": doc_url,
+            "file_path": file_path,
+            "last_updated": datetime.utcnow().isoformat(),
+            "doc_hash": doc_hash,
+        }
+
+        # Chunk the content
+        chunks = chunk_text(content)
+
+        return [
+            {"content": chunk, "metadata": metadata, "doc_hash": doc_hash}
+            for chunk in chunks
+        ]
+
+
+def get_embeddings(text: str) -> Dict[str, List[float]]:
+    """Get embeddings using both OpenAI and Cohere models via LiteLLM."""
+    # Get OpenAI embedding
+    openai_response = embedding(model="text-embedding-ada-002", input=[text])
+    openai_embedding = openai_response["data"][0]["embedding"]
+
+    # Get Cohere embedding
+    cohere_response = embedding(
+        model="cohere/embed-english-v3.0",
+        input=[text],
+        input_type="search_document",  # Specific to Cohere v3 models
+    )
+    cohere_embedding = cohere_response["data"][0]["embedding"]
+
+    return {"openai": openai_embedding, "cohere": cohere_embedding}
+
+
+def setup_qdrant_collection():
+    """Create or recreate the vector collection."""
+    # Delete if exists
+    try:
+        qdrant_client.delete_collection(COLLECTION_NAME)
+    except Exception:
+        pass
+
+    # Create collection with two vector types
+    qdrant_client.create_collection(
+        collection_name=COLLECTION_NAME,
+        vectors_config={
+            "openai": models.VectorParams(
+                size=OPENAI_EMBEDDING_DIM, distance=models.Distance.COSINE
+            ),
+            "cohere": models.VectorParams(
+                size=COHERE_EMBEDDING_DIM, distance=models.Distance.COSINE
+            ),
+        },
+    )
+
+
+def upsert_chunks(chunks: List[Dict]):
+    """Upsert document chunks to the vector store."""
+    for i, chunk in enumerate(chunks):
+        # Get both embeddings using LiteLLM
+        embeddings = get_embeddings(chunk["content"])
+
+        # Create payload
+        payload = {**chunk["metadata"], "content": chunk["content"], "chunk_index": i}
+
+        # Upsert to Qdrant
+        qdrant_client.upsert(
+            collection_name=COLLECTION_NAME,
+            points=[
+                models.PointStruct(
+                    id=f"{chunk['doc_hash']}",
+                    payload=payload,
+                    vector=embeddings,  # Contains both 'openai' and 'cohere' embeddings
+                )
+            ],
+        )
+
+
+def main():
+    # Get environment variables
+    docs_path = os.getenv("DOCS_PATH")
+    docs_base_url = os.getenv("DOCS_BASE_URL")
+
+    if not docs_path or not docs_base_url:
+        raise ValueError("DOCS_PATH and DOCS_BASE_URL must be set in .env file")
+
+    # Create fresh collection
+    setup_qdrant_collection()
+
+    # Process all documents
+    all_docs = get_all_docs(docs_path)
+    for doc_path in tqdm.tqdm(all_docs):
+        print(f"Processing {doc_path}")
+        chunks = process_doc(doc_path, docs_path, docs_base_url)
+        upsert_chunks(chunks)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/custom_workflows/rag-docs-qa/query.py b/examples/custom_workflows/rag-docs-qa/query.py
new file mode 100644
index 0000000000..31ed37f888
--- /dev/null
+++ b/examples/custom_workflows/rag-docs-qa/query.py
@@ -0,0 +1,165 @@
+import os
+from typing import List, Dict
+from dotenv import load_dotenv
+from qdrant_client import QdrantClient
+import litellm
+from litellm import embedding, completion, rerank
+import agenta as ag
+from pydantic import BaseModel, Field
+from typing import Annotated
+from agenta.sdk.assets import supported_llm_models
+
+system_prompt = """
+    You are a helpful assistant that answers questions based on the documentation.
+    """
+user_prompt = """
+    Here is the query: {query}
+
+    Here is the context: {context}
+    """
+ag.init()
+
+# litellm.callbacks = [ag.callbacks.litellm_handler()]
+
+# Initialize Qdrant client
+qdrant_client = QdrantClient(
+    url=os.getenv("QDRANT_URL"), api_key=os.getenv("QDRANT_API_KEY")
+)
+
+
+class Config(BaseModel):
+    system_prompt: str = Field(default=system_prompt)
+    user_prompt: str = Field(default=user_prompt)
+    embedding_model: Annotated[str, ag.MultipleChoice(["openai", "cohere"])] = Field(
+        default="openai"
+    )
+    llm_model: Annotated[str, ag.MultipleChoice(choices=supported_llm_models)] = Field(
+        default="gpt-3.5-turbo"
+    )
+    top_k: int = Field(default=10, ge=1, le=25)
+    rerank_top_k: int = Field(default=3, ge=1, le=10)
+    use_rerank: bool = Field(default=True)
+
+
+def get_embeddings(text: str, model: str) -> Dict[str, List[float]]:
+    """Get embeddings using both OpenAI and Cohere models via LiteLLM."""
+    if model == "openai":
+        return embedding(model="text-embedding-ada-002", input=[text])["data"][0][
+            "embedding"
+        ]
+    elif model == "cohere":
+        return embedding(
+            model="cohere/embed-english-v3.0",
+            input=[text],
+            input_type="search_query",  # Use search_query for queries
+        )["data"][0]["embedding"]
+
+    raise ValueError(f"Unknown model: {model}")
+
+
+@ag.instrument()
+def search_docs(
+    query: str, collection_name: str = os.getenv("COLLECTION_NAME", "docs_collection")
+) -> List[Dict]:
+    """
+    Search the documentation using both OpenAI and Cohere embeddings.
+
+    Args:
+        query: The search query
+        limit: Maximum number of results to return
+        score_threshold: Minimum similarity score (0-1) for results
+        collection_name: Name of the Qdrant collection to search
+
+    Returns:
+        List of dictionaries containing matched documents and their metadata
+    """
+    # Get embeddings for the query
+    config = ag.ConfigManager.get_from_route(Config)
+
+    # Search using both embeddings
+    results = qdrant_client.query_points(
+        collection_name=collection_name,
+        query=get_embeddings(query, config.embedding_model),
+        using=config.embedding_model,
+        limit=config.top_k,
+    )
+    # Format results
+    formatted_results = []
+    for result in results.points:
+        formatted_result = {
+            "content": result.payload["content"],
+            "metadata": {
+                "title": result.payload["title"],
+                "url": result.payload["url"],
+                "score": result.score,
+            },
+        }
+        formatted_results.append(formatted_result)
+
+    return formatted_results
+
+
+@ag.instrument()
+def llm(query: str, results: List[Dict]):
+    config = ag.ConfigManager.get_from_route(Config)
+    context = []
+    for i, result in enumerate(results, 1):
+        score = result["metadata"].get("rerank_score", result["metadata"]["score"])
+        item = f"Result {i} (Score: {score:.3f})\n"
+        item += f"Title: {result['metadata']['title']}\n"
+        item += f"URL: {result['metadata']['url']}\n"
+        item += f"Content: {result['content']}\n"
+        item += "-" * 80 + "\n"
+        context.append(item)
+
+    ag.tracing.store_internals({"context": context})
+    response = completion(
+        model=config.llm_model,
+        messages=[
+            {"role": "system", "content": config.system_prompt},
+            {
+                "role": "user",
+                "content": config.user_prompt.format(
+                    query=query, context="".join(context)
+                ),
+            },
+        ],
+    )
+    return response.choices[0].message.content
+
+
+@ag.instrument()
+def rerank_results(query: str, results: List[Dict]) -> List[Dict]:
+    """Rerank the search results using Cohere's reranker."""
+    config = ag.ConfigManager.get_from_route(Config)
+    # Format documents for reranking
+    documents = [result["content"] for result in results]
+
+    # Perform reranking
+    reranked = rerank(
+        model="cohere/rerank-english-v3.0",
+        query=query,
+        documents=documents,
+        top_n=config.rerank_top_k,
+    )
+    # Reorder the original results based on reranking
+    reranked_results = []
+    for item in reranked.results:
+        # The rerank function returns dictionaries with 'document' and 'index' keys
+        reranked_results.append(results[item["index"]])
+        # Add rerank score to metadata
+        reranked_results[-1]["metadata"]["rerank_score"] = item["relevance_score"]
+
+    return reranked_results
+
+
+@ag.route("/", config_schema=Config)
+@ag.instrument()
+def generate(query: str):
+    config = ag.ConfigManager.get_from_route(Config)
+    results = search_docs(query)
+    if config.use_rerank:
+        reranked_results = rerank_results(query, results)
+        return llm(query, reranked_results)
+    else:
+        return llm(query, results)
diff --git a/examples/custom_workflows/rag-docs-qa/requirements.in b/examples/custom_workflows/rag-docs-qa/requirements.in
new file mode 100644
index 0000000000..e06a4699f7
--- /dev/null
+++ b/examples/custom_workflows/rag-docs-qa/requirements.in
@@ -0,0 +1,6 @@
+qdrant-client
+python-dotenv
+litellm
+python-frontmatter
+tiktoken
+agenta
\ No newline at end of file
diff --git a/examples/custom_workflows/rag-docs-qa/test_set.csv b/examples/custom_workflows/rag-docs-qa/test_set.csv
new file mode 100644
index 0000000000..9c7d6028b4
--- /dev/null
+++ b/examples/custom_workflows/rag-docs-qa/test_set.csv
@@ -0,0 +1,311 @@
+query,correct_answer
+How can I deploy Agenta locally in production mode?,
+What command should I run to update Agenta to the latest version?,
+How do I start Agenta in development mode?,
+What steps should I take to troubleshoot if I encounter a port conflict issue?,
+Where can I seek help or report issues related to Agenta?,
+How can I deploy Agenta on Kubernetes for my AI model hosting use case?,
+"I'm having trouble accessing the Kubernetes deployment feature in Agenta Enterprise, how can I troubleshoot this?",
+Can Agenta Enterprise on Kubernetes support scaling my AI models automatically based on demand?,
+What tools are provided by Agenta Enterprise to manage multiple users and teams?,
+Can you provide more details about Agenta Enterprise's early access stage for select partners?,
+How can I deploy Agenta on AWS EC2 using Terraform?,
+What are the open ports created by the Terraform module for Agenta on AWS EC2?,
+How do I serve a variant to an instance after hosting Agenta?,
+What steps are involved in SSH-ing into the instance where Agenta is hosted?,
+How can I delete all the resources created by Terraform for Agenta on AWS?,
+How can I deploy Agenta on Google Cloud Engine using Terraform?,
+What are the prerequisites for deploying Agenta on Google Cloud Engine?,
+How do I SSH into the instance after deploying Agenta on Google Cloud Engine?,
+What security considerations should I keep in mind when allowing SSH access to the instance?,
+How can I delete all the resources created by Terraform for Agenta on Google Cloud Engine?,
+How can I host Agenta on a remote server like an AWS EC2 instance?,
+What are the prerequisites for deploying Agenta on a remote server?,
+How do I obtain the public IP of my AWS EC2 instance?,
+What environment variables do I need to set before launching the Agenta server?,
+"After starting the Agenta server, how can I verify that it is running correctly on the remote server?",
+How can I apply schema migrations to my PostgreSQL database using the provided instructions?,
+What command should I run to ensure that Alembic looks for the configuration file and other necessary files in the specified directory before executing?,
+"If I need to revert a schema migration due to encountered issues, how can I do that according to the documentation?",
+"Is there a way to automate the application of migrations without manually running the upgrade command? If so, how can I achieve this?",
+"After completing the migration, what should I do to verify data integrity in PostgreSQL and ensure that everything works fine?",
+How can I back up my MongoDB database before running the migration script?,
+What Docker command should I use to start the local instance of Agenta for migration?,
+"After initiating the migration, how can I access the running docker containers?",
+How can I verify the data integrity in PostgreSQL post migration?,
+"In case I encounter issues during migration, how can I revert the migration and what should I do next?",
+How can I upgrade to the latest version of Agenta using Docker?,
+What database does Agenta now use starting from version 0.19?,
+How can I access the backend Docker container for Agenta?,
+What is Beanie and how does it relate to MongoDB ODM for Python?,
+What are the steps involved in performing a database migration using Beanie with Agenta backend system?,
+Where can I find answers to common questions about Agenta?,
+How can I get community support for Agenta? Is there a Slack community?,
+What is the recommended channel for reporting bugs in Agenta?,
+"As a Pro plan cloud user, how can I access direct team support?",
+"If I'm interested in demos or have sales inquiries, how can I schedule a call with the founders?",
+How can I add new users to my workspace?,
+What are the different user roles available and what are their respective rights?,
+"As a Workspace Admin, what are the limitations in terms of managing the workspace?",
+What is the role of a Deployment Manager and what tasks do they handle within the workspace?,
+How can I switch between different workspaces if I am a member of multiple workspaces?,
+How do I format backend and CLI code using Black in specific directories?,
+"What are the main steps to contribute to Agenta, especially when picking an issue?",
+Can you guide me through running backend tests locally before making a pull request?,
+How can I update the .env.local file in the frontend directory to include my OpenAI API Key?,
+What are the rules regarding issue assignment and PR activity to prevent zombie issues in the Agenta project?,
+How can I use my own version of Agenta CLI or SDK instead of the installed one?,
+What steps can I take if I'm unable to run Agenta in my terminal even after following the provided instructions?,
+How can I quickly work and test a new type of parameter like IntParam in the SDK?,
+Are there any specific steps needed for working on the backend code in Agenta?,
+How can I efficiently debug the backend code in Agenta using Docker or Visual Studio Code?,
+How can I report a bug using the bug report template?,
+Where can I find the network logs in the browser and why are they important when reporting an issue?,
+What command should I run to get information about Docker containers and how can I provide this information when filing an issue?,
+"When reporting an issue related to a specific llm app, what are the steps to provide container information using Docker Desktop or the terminal?",
+Why is providing network logs and Docker container information crucial when filing an issue?,
+How can I start monitoring and understanding the behavior of my LLM application with Agenta?,
+What are the advantages of using OpenTelemetry (OTel) with Agenta's observability features?,
+What is the relationship between traces and spans in the context of Agenta?,
+Where can I find a Quick Start guide to get started with observability in Agenta?,
+How can I learn to instrument specific applications like OpenAI or LangChain with Agenta?,
+How can I enable tracing for OpenAI calls using Agenta SDK?,
+What packages do I need to install to set up observability for an OpenAI application running locally?,
+How can I configure environment variables for Agenta OSS running locally?,
+Where can I find the captured traces of my application's requests in the Agenta UI?,
+Is tracing enabled by default if I create an application through the Agenta UI?,
+How can I install the Agenta SDK?,
+What is the purpose of the @ag.instrument() decorator in Agenta SDK?,
+How can I add additional metadata to a span in Agenta?,
+"Can you explain how to link spans to applications, variants, and environments in Agenta?",
+What options are available for redacting sensitive data in Agenta SDK?,
+How do I install LiteLLM and Agenta SDK?,
+What are the different environments I can configure while setting up LiteLLM with Agenta?,
+How can I initialize Agenta SDK in my LiteLLM application?,
+What does the @ag.instrument() decorator do in the code example provided?,
+Why is it important to set up the callback handler for LiteLLM while using Agenta?,
+How can I install the required packages for using LangChain with Agenta?,
+What environment variables do I need to configure when using Agenta Cloud or Enterprise?,
+What steps are involved in the code example provided for a LangChain application?,
+How do I initialize Agenta in my LangChain application?,
+Why is it important to call 'LangchainInstrumentor().instrument()' before running my LangChain application?,
+How can I instrument OpenAI API calls with Agenta using `opentelemetry-instrumentation-openai` package?,
+What packages do I need to install to instrument OpenAI API calls with Agenta using `opentelemetry-instrumentation-openai` package?,
+How do I configure environment variables for Agenta Cloud or Enterprise when instrumenting OpenAI API calls?,
+What decorator can I use to monitor multiple calls in a function or workflow as a single trace?,
+How can I associate traces with specific parts of an Agenta project when instrumenting functions?,
+How do I install the required packages for using Instructor with Agenta?,
+What environment variables do I need to configure for Agenta Cloud or Enterprise setup?,
+What must I ensure the order of when instrumenting OpenAI and creating the Instructor client?,
+Why is it mentioned to instrument OpenAI before creating the Instructor client?,
+"What does the @ag.instrument(spankind=""WORKFLOW"") decorator do in the code example?",
+How can I experiment and compare prompts using Agenta?,
+What types of workflows does Agenta support for prompt management and evaluation?,
+How can I collaborate with product teams using Agenta for prompt engineering and evaluation?,
+What are the different ways to deploy an application with Agenta?,
+"Is Agenta compatible with various LLM app architectures and model providers? If so, how?",
+How can I create a new LLM app using an existing template in Agenta?,
+Where can I find the API endpoint after deploying my application in Agenta?,
+What steps do I need to follow to test the application in the playground?,
+How do I add my OpenAI API keys when self-hosting Agenta?,
+What are some next steps after creating my first LLM application in Agenta?,
+How can I create custom templates for my LLM-powered applications in Agenta?,
+What are the predefined environments in Agenta and their purposes?,
+"Explain the relationship between variants, versions, and commits in Agenta.",
+Can you describe the difference between Completion Application Template and Chat Application Template in Agenta?,
+What can I do if the deployment of a variant to an environment does not update automatically in Agenta?,
+How does Agenta enable rapid experimentation and evaluation for LLM applications?,
+What is the significance of Agenta treating each application as a microservice?,
+How does Agenta handle the separation of application logic and configuration?,
+What role does Agenta's backend play in managing applications and configurations?,
+Can you explain the purpose and functionality of Agenta's SDK for a Python library?,
+How can I collaborate with subject matter experts using custom workflows in Agenta?,
+What is the main problem with traditional prompt playgrounds?,
+Can I trust the outputs of traditional prompt playgrounds?,
+How does Agenta's Custom Workflows simplify debugging for AI engineers?,
+What frameworks and models are compatible with Agenta?,
+How can I create a custom workflow with two prompts using Agenta?,
+What are the steps to serve an application in Agenta using the CLI?,
+How do I initialize the Agenta SDK in my Python code?,
+What is the purpose of the `CoPConfig` class in the provided code snippet?,
+Can you explain the concept of entry points in Agenta and how they are used in the code example?,
+How can I configure evaluators for my LLM application?,
+What tools are available for creating test sets for evaluation?,
+Can I run evaluations directly from the web UI in Agenta?,
+What types of evaluators are available for classification/entity extraction in LLM applications?,
+How can I evaluate the faithfulness of outputs in RAG workflows?,
+How can I create a test set for evaluation in agenta SDK?,
+What is the purpose of creating evaluators in the agenta SDK?,
+How can I run an evaluation job using the agenta SDK?,
+What is the function of the rate limit configuration in evaluation jobs?,
+How can I retrieve the detailed results of an evaluation using the agenta SDK?,
+How can I initiate a single model evaluation in the Human Evaluation feature?,
+What steps are involved in starting a new evaluation with the single model test?,
+How do I compare the performance of two different variants manually using A/B Test in the Human Evaluation feature?,
+Can collaborators be invited to collaborate on an A/B Test evaluation in the Human Evaluation feature?,
+Is there a way to switch between card and table view in the evaluation process? How can this be done?,
+How do I create a test set in Agenta using a CSV or JSON file?,
+What is the default column name for the reference answer in a CSV test set?,
+What is the structure of a valid JSON file for a test set in Agenta?,
+How can I add data to a test set from the playground in Agenta?,
+"In Agenta, how can I upload a test set using the API?",
+How do I start a new evaluation in the UI?,
+What parameters can be specified when setting up a new evaluation?,
+What are the advanced configuration options available for adjusting batching and retry parameters?,
+How can I analyze the results of an evaluation in more detail?,
+"Can I compare multiple evaluations from the same test set in the UI? If yes, how?",
+How can I configure evaluators for my LLM application?,
+What are the inputs that evaluators typically take?,
+Which button should I click to create a new evaluator?,
+Can I create custom evaluators for my LLM application?,
+What is the purpose of mappings evaluator's inputs to the LLM data?,
+How can I create a webhook evaluator for a specific use case in Agenta?,
+What are the limitations in terms of security measures when using webhook evaluators?,
+What input parameters are required for a webhook evaluator in Agenta?,
+What is the expected format of the webhook request body for a webhook evaluator?,
+What should the webhook response body contain in order to be considered properly-formatted?,
+How does the Exact Match evaluator determine if the model's output is correct?,
+What does the Contains JSON evaluator check for in the model's output?,
+How does the JSON Field Match evaluator compare specific fields within JSON data?,
+Can you explain the process of JSON Diff Match evaluation?,
+What configuration options are available for the JSON Diff Match evaluator and how do they affect the comparison process?,
+How can I assess the performance of LLMs by identifying specific patterns within the output generated by the model?,
+What is the purpose of the 'Regex Test' evaluator in Agenta and how does it work?,
+Could you provide an example of using the 'Starts With' evaluator in Agenta and its significance?,
+How does the 'Contains Any' evaluator in Agenta differ from the 'Contains All' evaluator?,
+In what situations would I use the 'Ends With' evaluator in Agenta and how does it handle case sensitivity?,
+How can I create a custom evaluator in Agenta?,
+What is the function signature for the 'evaluate' function in a custom evaluator?,
+What is the purpose of the 'evaluate' function in a custom evaluator?,
+Which language is used for writing custom evaluators in Agenta?,
+What ranges of scores can the 'evaluate' function return?,
+How can I configure and use RAG evaluators in my custom-built application?,
+What version of the Python SDK is required to access internal variables and intermediate outputs for RAG Evaluators?,
+Where can I find the source code for a simple RAG Application that fetches movies and generates summaries?,
+How can I troubleshoot if the RAG Evaluator and `view trace` are not accessible in Agenta OSS?,
+Which utility can I use in Agenta to add internal variables (internals) to stages of a workflow?,
+How does the Similarity Match evaluator determine a match between the generated output and the correct answer?,
+What configuration parameter is needed for the Semantic Similarity Match evaluator?,
+What is the cost associated with using the Semantic Similarity Match evaluator that uses OpenAI embeddings?,
+In what use cases is the Levenshtein Distance evaluator particularly useful?,
+Can you explain how the Levenshtein Distance evaluator works for comparing text outputs?,
+How can I configure the prompt for LLM-as-a-Judge evaluation?,
+"What variables can be used to reference inputs, outputs, and reference answers in the prompt?",
+What is the default prompt used for the country expert demo application?,
+"Which models are supported by LLM-as-a-Judge, and how can I select a specific model?",
+Where do I need to set my OpenAI or Anthropic API key to use LLM-as-a-Judge?,
+How can I create a test set using the SDK?,
+What is the process for creating and configuring an evaluator programmatically?,
+How can I check the status of an evaluation run?,
+Where can I find the list of evaluator keys and configurations?,
+What should I do if I encounter rate limits during the evaluation process?,
+How can I create a new prompt using the Agenta SDK?,
+How do I deploy changes to the production environment using the SDK?,
+"What is the structure used by Agenta for prompt versioning, and how does it differ from traditional versioning systems?",
+"In case of an issue with deploying changes using the SDK, what steps can I take to troubleshoot?",
+Can you explain the concept of variants in Agenta and how they are similar to branches in Git?,
+How can I set up tracing for a RAG application in LangChain using Agenta?,
+What is the purpose of tracing in LLM applications?,
+"How do I install the necessary dependencies for LangChain, Agenta, and instrumentation?",
+Can you provide an example of a Q&A RAG application in LangChain and how it works?,
+How can I retrieve and generate answers using relevant snippets of documentation in Agenta?,
+How can I fetch the diff for a GitHub pull request using Python?,
+What role does LiteLLM play in the AI assistant workflow described in the tutorial?,
+How can I add observability to my LLM application using Agenta?,
+What is the purpose of creating an LLM playground for the application?,
+How can I deploy my AI assistant to production using Agenta?,
+How can I version prompts in Agenta?,
+What capabilities does Agenta provide for prompt management?,
+Why do I need a prompt management system?,
+What is the purpose of configuration management in Agenta?,
+How can I publish a prompt to an endpoint from the web UI?,
+How can I create a prompt using the web UI?,
+Why do I need to publish a variant to a deployment?,
+How can I integrate the prompt configuration with my Python code using the Agenta SDK?,
+"Can I revert to a previous deployment version, and if so, how?",
+What is the purpose of using Pydantic for schema validation in the SDK integration process?,
+How can I create reusable prompts using prompt templates in the agenta playground?,
+Where can I find the option to add new inputs to the LLM app in the playground?,
+"What is the process for creating a new variant of an application in agenta, and where can I provide a new name for the variant?",
+"How can I test a variant dynamically in the playground, and is there a way to run all inputs in a test set?",
+"In the agenta playground, how do I compare variants side by side? Can I interact with different variants simultaneously in a chat application?",
+How can I create a new prompt using the SDK?,
+What is the structure followed by Agenta for prompt versioning?,
+How can I delete a variant in Agenta? Is this action reversible?,
+Can I list all variants of an application using the SDK? How can I do that?,
+What is the default behavior for fetching configurations in Agenta?,
+How can I use the Agenta SDK to invoke the deployed version of my prompt?,
+Where can I find the call to invoke the deployed version of my prompt within the Agenta UI?,
+What parameters do I need to provide when invoking a deployed prompt through the REST API?,
+What does the 'inputs' dictionary in the parameters contain and how should it be structured?,
+How can I control which environment version of my prompt is being called when using the REST API?,
+How do I fetch the deployed version of my prompt in my code using the Agenta SDK?,
+What Python SDK do I need to install for using the Agenta SDK?,
+What environment variables need to be set up for fetching prompts?,
+"In the sample output, what is the value of 'model' fetched from staging configuration?",
+Which slugs should I provide when fetching prompts using Agenta SDK?,
+How can Agenta help me manage prompts in my application?,
+What are the advantages of using Agenta as a prompt management system?,
+What considerations should I keep in mind when integrating observability manually with Agenta?,
+When should I consider using Agenta as a middleware/model proxy?,
+Does using Agenta as a proxy add any latency to the response?,
+How can I install the agenta CLI tool?,
+What is the command to install agenta using pip?,
+Where can I find a quick usage guide for the agenta CLI?,
+How do I get an overview of the main commands and capabilities of agenta CLI?,
+Is there a tutorial available for deploying an LLM app from code using agenta CLI?,
+How can I initialize a new Agenta project using the CLI?,
+"What options can be provided with the 'agenta init' command, and how do they affect the project initialization process?",
+"For the 'agenta variant list' command, where should it be executed, and what information does it provide?",
+What are the steps involved in removing a variant from an application using the 'agenta variant remove' command?,
+"How can I deploy an application variant to the Agenta platform using the 'agenta variant serve' command, and what additional options are available for deployment?",
+How can I create a new variant with initial configuration parameters?,
+What method should I use to commit changes to an existing variant?,
+Which method is used to deploy a variant to a specific environment?,
+How can I retrieve the version history of a variant?,
+What are the available predefined environments for deployment?,
+How can I store additional data within the current span in a trace?,
+What utility function can I use to validate if a string is a valid attribute key?,
+How can I link spans to specific Agenta resources using the SDK?,
+What method should I use to set the status of a span in the CustomSpan class?,
+When should I apply the @ag.instrument() decorator in my Python code?,
+How can I expose specific stages of LLM workflows as API endpoints using the Custom Workflows SDK?,
+What types of fields are accepted in the configuration schema when defining a configuration for a function?,
+How can I define fields with constraints and defaults in the configuration schema when using `@ag.route` or `@ag.entrypoint` decorators?,
+"How are different field types represented in the Agenta playground UI, and what are the input methods for each type?",
+How can I retrieve configuration information from the route context using `ag.ConfigManager.get_from_route()` and how is this configuration used in a function?,
+How do I experiment with AI applications using the agenta SDK?,
+What is the purpose of the deprecated SDK v2 mentioned on the page?,
+How can I initialize my variant using the agenta SDK?,
+What function should I use to set the default configuration in the agenta SDK?,
+Can the agenta SDK be used with any Foundational Model?,
+How can I display a text area widget in the playground using the SDK v2?,
+What is the difference between IntParam and FloatParam in terms of the widgets they display in the playground?,
+Can you provide an example of how to use BinaryParam in the SDK v2 configuration?,
+What is the purpose of the GroupedMultipleChoiceParam in the configuration settings?,
+What is the important note mentioned regarding the initialization of BinaryParam in the SDK v2?,
+How do I access the parameters in the configuration in my code?,
+What is the purpose of the 'config' object in the SDK v2?,
+Can I experiment with different parameters using the 'config' object?,
+What is the significance of calling 'agenta.init()' before using the 'config' object?,
+What are some examples of parameters that can be saved in the 'config' object for an LLM variant?,
+Why is it recommended to call `agenta.init()` only once at the entry point of the code?,
+What happens if `agenta.init()` is called multiple times in the code?,
+How does calling `agenta.init()` help in initializing the variant?,
+Is there a different method to initialize the variant in the deprecated SDK v2?,
+What could be the potential impact of using the deprecated SDK v2 for building an llm application?,
+How can I set the default configuration for my variant using the register_default method?,
+What happens if I set the default prompt value to 'Hello World' in my configuration?,
+When should I use the register_default method in my llm application development?,
+What parameters can I set using the register_default method for my llm application?,
+How can I access the prompt1 parameter value from the configuration in the backend?,
+How do I push configuration using agenta.config.push()?,
+What happens if I set overwrite to True when pushing a configuration?,
+How can I avoid overwriting an existing configuration when using agenta.config.push()?,
+Where does the pushed configuration get stored for a specific code base?,
+Can I push configurations for multiple variants using agenta.config.push()?,
+How can I pull a configuration with a specific name?,
+What happens when I pull a configuration using the 'production' environment?,
+How can I access the parameters after pulling the configuration?,
+Can I pull configurations for different variants using this function?,
+Why does the documentation mention that this page is for the deprecated SDK v2?,
\ No newline at end of file
diff --git a/examples/custom_workflows/rag-docs-qa/test_set_small.csv b/examples/custom_workflows/rag-docs-qa/test_set_small.csv
new file mode 100644
index 0000000000..70f40372f4
--- /dev/null
+++ b/examples/custom_workflows/rag-docs-qa/test_set_small.csv
@@ -0,0 +1,18 @@
+query,correct_answer
+How can I deploy Agenta locally in production mode?,
+What command should I run to update Agenta to the latest version?,
+How do I start Agenta in development mode?,
+What steps should I take to troubleshoot if I encounter a port conflict issue?,
+Where can I seek help or report issues related to Agenta?,
+How can I deploy Agenta on Kubernetes for my AI model hosting use case?,
+"I'm having trouble accessing the Kubernetes deployment feature in Agenta Enterprise, how can I troubleshoot this?",
+Can Agenta Enterprise on Kubernetes support scaling my AI models automatically based on demand?,
+What tools are provided by Agenta Enterprise to manage multiple users and teams?,
+Can you provide more details about Agenta Enterprise's early access stage for select partners?,
+How can I deploy Agenta on AWS EC2 using Terraform?,
+What are the open ports created by the Terraform module for Agenta on AWS EC2?,
+How do I serve a variant to an instance after hosting Agenta?,
+What steps are involved in SSH-ing into the instance where Agenta is hosted?,
+How can I delete all the resources created by Terraform for Agenta on AWS?,
+How can I deploy Agenta on Google Cloud Engine using Terraform?,
+What are the prerequisites for deploying Agenta on Google Cloud Engine?,
\ No newline at end of file