diff --git a/.all-contributorsrc b/.all-contributorsrc index b8616e3997..5a1ad7781f 100644 --- a/.all-contributorsrc +++ b/.all-contributorsrc @@ -437,6 +437,16 @@ "bug", "code" ] + }, + { + "login": "jp-agenta", + "name": "jp-agenta", + "avatar_url": "https://avatars.githubusercontent.com/u/174311389?v=4", + "profile": "https://github.com/jp-agenta", + "contributions": [ + "code", + "bug" + ] } ], "contributorsPerLine": 7, diff --git a/.github/workflows/run-frontend-tests.yml b/.github/workflows/run-frontend-tests.yml index e87e35fa39..684ba3552c 100644 --- a/.github/workflows/run-frontend-tests.yml +++ b/.github/workflows/run-frontend-tests.yml @@ -33,7 +33,7 @@ jobs: NEXT_PUBLIC_OPENAI_API_KEY: ${{ secrets.NEXT_PUBLIC_OPENAI_API_KEY }} run: | sudo apt install curl -y - docker-compose -f "docker-compose.test.yml" up -d --build + OPENAI_API_KEY=${{ secrets.NEXT_PUBLIC_OPENAI_API_KEY }} ENVIRONMENT=github docker-compose -f "docker-compose.test.yml" up -d --build - name: Restart Backend Service To Fetch Template(s) run: docker container restart agenta-backend-test diff --git a/README.md b/README.md index 4c71c00d76..39749fe7bf 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,18 @@ +
+ + Important: We are migrating from MongoDB to PostgreSQL in v0.19! Follow this guide to migrate your data. + +
+
+
+
We are hiring! Join our team! - +
+
@@ -42,7 +51,6 @@

-

@@ -55,7 +63,6 @@

-
@@ -64,7 +71,6 @@ -


@@ -96,54 +102,58 @@ # ⭐️ Why Agenta? -Agenta is an end-to-end LLM developer platform. It provides the tools for **prompt engineering and management**, ⚖️ **evaluation**, **human annotation**, and :rocket: **deployment**. All without imposing any restrictions on your choice of framework, library, or model. +Agenta is an end-to-end LLM developer platform. It provides the tools for **prompt engineering and management**, ⚖️ **evaluation**, **human annotation**, and :rocket: **deployment**. All without imposing any restrictions on your choice of framework, library, or model. -Agenta allows developers and product teams to collaborate in building production-grade LLM-powered applications in less time. +Agenta allows developers and product teams to collaborate in building production-grade LLM-powered applications in less time. ### With Agenta, you can: -- [🧪 **Experiment** and **compare** prompts](https://docs.agenta.ai/basic_guides/prompt_engineering) on [any LLM workflow](https://docs.agenta.ai/advanced_guides/custom_applications) (chain-of-prompts, Retrieval Augmented Generation (RAG), LLM agents...) -- ✍️ Collect and [**annotate golden test sets**](https://docs.agenta.ai/basic_guides/test_sets) for evaluation -- 📈 [**Evaluate** your application](https://docs.agenta.ai/basic_guides/automatic_evaluation) with pre-existing or [**custom evaluators**](https://docs.agenta.ai/advanced_guides/using_custom_evaluators) -- [🔍 **Annotate** and **A/B test**](https://docs.agenta.aibasic_guides/human_evaluation) your applications with **human feedback** -- [🤝 **Collaborate with product teams**](https://docs.agenta.ai/basic_guides/team_management) for prompt engineering and evaluation -- [🚀 **Deploy your application**](https://docs.agenta.ai/basic_guides/deployment) in one-click in the UI, through CLI, or through github workflows. +- [🧪 **Experiment** and **compare** prompts](https://docs.agenta.ai/prompt_management/prompt_engineering) on [any LLM workflow](https://docs.agenta.ai/prompt_management/custom_applications) (chain-of-prompts, Retrieval Augmented Generation (RAG), LLM agents...) +- ✍️ Collect and [**annotate golden test sets**](https://docs.agenta.ai/evaluation/test_sets) for evaluation +- 📈 [**Evaluate** your application](https://docs.agenta.ai/evaluation/automatic_evaluation) with pre-existing or [**custom evaluators**](https://docs.agenta.ai/evaluation/custom_evaluator) +- [🔍 **Annotate** and **A/B test**](https://docs.agenta.ai/evaluation/human_evaluation) your applications with **human feedback** +- [🤝 **Collaborate with product teams**](https://docs.agenta.ai/misc/team_management) for prompt engineering and evaluation +- [🚀 **Deploy your application**](https://docs.agenta.ai/prompt_management/deployment) in one-click in the UI, through CLI, or through github workflows. ### Works with any LLM app workflow Agenta enables prompt engineering and evaluation on any LLM app architecture: + - Chain of prompts - RAG - Agents -- ... -It works with any framework such as [Langchain](https://langchain.com), [LlamaIndex](https://www.llamaindex.ai/) and any LLM provider (openAI, Cohere, Mistral). - -[Jump here to see how to use your own custom application with agenta](/advanced_guides/custom_applications) +It works with any framework such as [Langchain](https://langchain.com), [LlamaIndex](https://www.llamaindex.ai/) and any LLM provider (openAI, Cohere, Mistral). # Quick Start ### [Get started for free](https://cloud.agenta.ai?utm_source=github&utm_medium=readme&utm_campaign=github) -### [Explore the Docs](https://docs.agenta.ai) -### [Create your first application in one-minute](https://docs.agenta.ai/quickstart/getting-started-ui) -### [Create an application using Langchain](https://docs.agenta.ai/tutorials/first-app-with-langchain) + +### [Explore the Docs](https://docs.agenta.ai/getting_started/introduction) + +### [Create your first application in one-minute](https://docs.agenta.ai/getting_started/quick-start) + +### [Create an application using Langchain](https://docs.agenta.ai/guides/tutorials/first-app-with-langchain) + ### [Self-host agenta](https://docs.agenta.ai/self-host/host-locally) -### [Check the Cookbook](https://docs.agenta.ai/cookbook) -# Features +### [Check the Cookbook](https://docs.agenta.ai/guides/evaluation_from_sdk) +# Features -| Playground | Evaluation | -| ------- | ------- | -| Compare and version prompts for any LLM app, from single prompt to agents.
), + width: 300, dataIndex: "inputs", render: (_: any, record: ABTestingEvaluationTableRow, rowIndex: number) => { return ( diff --git a/agenta-web/src/components/EvaluationTable/SingleModelEvaluationTable.tsx b/agenta-web/src/components/EvaluationTable/SingleModelEvaluationTable.tsx index e6af1b5162..2adee420ae 100644 --- a/agenta-web/src/components/EvaluationTable/SingleModelEvaluationTable.tsx +++ b/agenta-web/src/components/EvaluationTable/SingleModelEvaluationTable.tsx @@ -193,7 +193,7 @@ const SingleModelEvaluationTable: React.FC = ({ debounce((data: Partial, scenarioId) => { updateEvaluationScenarioData(scenarioId, data) }, 800), - [evaluationScenarios], + [rows], ) useEffect(() => { @@ -278,7 +278,7 @@ const SingleModelEvaluationTable: React.FC = ({ .then(() => { Object.keys(data).forEach((key) => { setRowValue( - evaluationScenarios.findIndex((item) => item.id === id), + rows.findIndex((item) => item.id === id), key, data[key as keyof EvaluationScenario], ) @@ -417,6 +417,7 @@ const SingleModelEvaluationTable: React.FC = ({ ), + width: 300, dataIndex: "inputs", render: (_: any, record: SingleModelEvaluationRow, rowIndex: number) => { return ( diff --git a/agenta-web/src/components/Evaluations/AutomaticEvaluationResult.tsx b/agenta-web/src/components/Evaluations/AutomaticEvaluationResult.tsx index 251fbc6319..c7a00dac7c 100644 --- a/agenta-web/src/components/Evaluations/AutomaticEvaluationResult.tsx +++ b/agenta-web/src/components/Evaluations/AutomaticEvaluationResult.tsx @@ -7,7 +7,7 @@ import {Button, Spin, Statistic, Table, Typography} from "antd" import {useRouter} from "next/router" import {useEffect, useState} from "react" import {ColumnsType} from "antd/es/table" -import {Evaluation, GenericObject} from "@/lib/Types" +import {Evaluation, GenericObject, StyleProps} from "@/lib/Types" import {DeleteOutlined} from "@ant-design/icons" import {EvaluationFlow, EvaluationType} from "@/lib/enums" import {createUseStyles} from "react-jss" @@ -41,10 +41,6 @@ interface EvaluationListTableDataType { variant_revision_ids: string[] } -type StyleProps = { - themeMode: "dark" | "light" -} - const useStyles = createUseStyles({ container: { marginBottom: 20, diff --git a/agenta-web/src/components/Evaluations/EvaluationCardView/EvaluationVariantCard.tsx b/agenta-web/src/components/Evaluations/EvaluationCardView/EvaluationVariantCard.tsx index 8cdb13ced6..c778a7417a 100644 --- a/agenta-web/src/components/Evaluations/EvaluationCardView/EvaluationVariantCard.tsx +++ b/agenta-web/src/components/Evaluations/EvaluationCardView/EvaluationVariantCard.tsx @@ -1,14 +1,10 @@ import {useAppTheme} from "@/components/Layout/ThemeContextProvider" -import {Evaluation, Variant} from "@/lib/Types" +import {Evaluation, Variant, StyleProps} from "@/lib/Types" import {Typography} from "antd" import React from "react" import {createUseStyles} from "react-jss" import {VARIANT_COLORS} from "." -type StyleProps = { - themeMode: "dark" | "light" -} - const useStyles = createUseStyles({ root: ({themeMode}: StyleProps) => ({ flex: 1, diff --git a/agenta-web/src/components/Evaluations/EvaluationCardView/index.tsx b/agenta-web/src/components/Evaluations/EvaluationCardView/index.tsx index f0097bc08e..b46b868e7e 100644 --- a/agenta-web/src/components/Evaluations/EvaluationCardView/index.tsx +++ b/agenta-web/src/components/Evaluations/EvaluationCardView/index.tsx @@ -4,8 +4,6 @@ import { LeftOutlined, LoadingOutlined, PlayCircleOutlined, - PushpinFilled, - PushpinOutlined, QuestionCircleOutlined, RightOutlined, } from "@ant-design/icons" @@ -18,7 +16,6 @@ import {ABTestingEvaluationTableRow} from "@/components/EvaluationTable/ABTestin import AlertPopup from "@/components/AlertPopup/AlertPopup" import {useLocalStorage} from "usehooks-ts" import {testsetRowToChatMessages} from "@/lib/helpers/testset" -import {safeParse} from "@/lib/helpers/utils" import {debounce} from "lodash" import {EvaluationType} from "@/lib/enums" import ParamsForm from "@/components/Playground/ParamsForm/ParamsForm" diff --git a/agenta-web/src/components/Evaluations/HumanEvaluationResult.tsx b/agenta-web/src/components/Evaluations/HumanEvaluationResult.tsx index 5eee97ec44..811adf13c0 100644 --- a/agenta-web/src/components/Evaluations/HumanEvaluationResult.tsx +++ b/agenta-web/src/components/Evaluations/HumanEvaluationResult.tsx @@ -4,7 +4,7 @@ import {Button, Spin, Statistic, Table, Typography} from "antd" import {useRouter} from "next/router" import {useEffect, useState} from "react" import {ColumnsType} from "antd/es/table" -import {EvaluationResponseType} from "@/lib/Types" +import {EvaluationResponseType, StyleProps} from "@/lib/Types" import {DeleteOutlined} from "@ant-design/icons" import {EvaluationFlow, EvaluationType} from "@/lib/enums" import {createUseStyles} from "react-jss" @@ -47,10 +47,6 @@ export interface HumanEvaluationListTableDataType { variantNames: string[] } -type StyleProps = { - themeMode: "dark" | "light" -} - const useStyles = createUseStyles({ container: { marginBottom: 20, diff --git a/agenta-web/src/components/HumanEvaluationModal/HumanEvaluationModal.tsx b/agenta-web/src/components/HumanEvaluationModal/HumanEvaluationModal.tsx index 983324b13a..74396dab4a 100644 --- a/agenta-web/src/components/HumanEvaluationModal/HumanEvaluationModal.tsx +++ b/agenta-web/src/components/HumanEvaluationModal/HumanEvaluationModal.tsx @@ -1,5 +1,5 @@ import React, {useEffect, useState} from "react" -import {GenericObject, JSSTheme, Parameter, Variant} from "@/lib/Types" +import {GenericObject, JSSTheme, Parameter, Variant, StyleProps} from "@/lib/Types" import {fetchVariants} from "@/services/api" import {createNewEvaluation} from "@/services/human-evaluations/api" import {isDemo} from "@/lib/helpers/utils" @@ -16,10 +16,6 @@ import EvaluationErrorModal from "../Evaluations/EvaluationErrorModal" import {dynamicComponent} from "@/lib/helpers/dynamic" import {useLoadTestsetsList} from "@/services/testsets/api" -type StyleProps = { - themeMode: "dark" | "light" -} - const useStyles = createUseStyles((theme: JSSTheme) => ({ evaluationContainer: { border: "1px solid lightgrey", diff --git a/agenta-web/src/components/Layout/Layout.tsx b/agenta-web/src/components/Layout/Layout.tsx index 38662ec4a6..16a07a58a8 100644 --- a/agenta-web/src/components/Layout/Layout.tsx +++ b/agenta-web/src/components/Layout/Layout.tsx @@ -28,11 +28,11 @@ import moonIcon from "@/media/night.png" import sunIcon from "@/media/sun.png" import {useProfileData} from "@/contexts/profile.context" import {ThemeProvider} from "react-jss" +import {StyleProps as MainStyleProps} from "@/lib/Types" const {Content, Footer} = Layout -type StyleProps = { - themeMode: "dark" | "light" +interface StyleProps extends MainStyleProps { footerHeight: number } diff --git a/agenta-web/src/components/Playground/AddToTestSetDrawer/AddToTestSetDrawer.tsx b/agenta-web/src/components/Playground/AddToTestSetDrawer/AddToTestSetDrawer.tsx index ba7d4f2a8c..1d835e28dd 100644 --- a/agenta-web/src/components/Playground/AddToTestSetDrawer/AddToTestSetDrawer.tsx +++ b/agenta-web/src/components/Playground/AddToTestSetDrawer/AddToTestSetDrawer.tsx @@ -1,6 +1,6 @@ import AlertPopup from "@/components/AlertPopup/AlertPopup" import {useAppTheme} from "../../Layout/ThemeContextProvider" -import {ChatMessage, ChatRole, GenericObject, testset} from "@/lib/Types" +import {ChatMessage, ChatRole, GenericObject, testset, StyleProps} from "@/lib/Types" import {removeKeys, renameVariables} from "@/lib/helpers/utils" import { createNewTestset, @@ -29,10 +29,6 @@ import {useLocalStorage, useUpdateEffect} from "usehooks-ts" import ChatInputs from "@/components/ChatInputs/ChatInputs" import _ from "lodash" -type StyleProps = { - themeMode: "dark" | "light" -} - const useStyles = createUseStyles({ footer: { display: "flex", diff --git a/agenta-web/src/components/Playground/NewVariantModal.tsx b/agenta-web/src/components/Playground/NewVariantModal.tsx index 34b5c0542c..627e47c3c1 100644 --- a/agenta-web/src/components/Playground/NewVariantModal.tsx +++ b/agenta-web/src/components/Playground/NewVariantModal.tsx @@ -61,6 +61,7 @@ const NewVariantModal: React.FC = ({ onCancel={() => setIsModalOpen(false)} centered okButtonProps={{disabled: !isInputValid}} // Disable OK button if input is not valid + destroyOnClose >
diff --git a/agenta-web/src/components/Playground/ParamsForm/ParamsForm.tsx b/agenta-web/src/components/Playground/ParamsForm/ParamsForm.tsx index b213497e93..53ad106781 100644 --- a/agenta-web/src/components/Playground/ParamsForm/ParamsForm.tsx +++ b/agenta-web/src/components/Playground/ParamsForm/ParamsForm.tsx @@ -28,12 +28,11 @@ const useStyles = createUseStyles((theme: JSSTheme) => ({ borderRadius: 6, }, paramValueContainer: { - border: `1px solid ${theme.colorBorder}`, - width: "100%", - borderRadius: theme.borderRadius, - padding: theme.paddingSM, - maxHeight: 300, - overflowY: "scroll", + "&:disabled": { + color: "inherit", + backgroundColor: "inherit", + cursor: "text", + }, }, })) @@ -109,20 +108,19 @@ const ParamsForm: React.FC = ({ alt={param.name} /> )} - {isPlaygroundComponent ? ( - - onParamChange?.(param.name, e.target.value) - } - autoSize={{minRows: 2, maxRows: 8}} - /> - ) : ( -
{param.value}
- )} + + onParamChange?.(param.name, e.target.value)} + disabled={!isPlaygroundComponent} + autoSize={{minRows: 2, maxRows: 8}} + />
) diff --git a/agenta-web/src/components/Playground/Views/ParametersView.tsx b/agenta-web/src/components/Playground/Views/ParametersView.tsx index 08ad60b802..47d5278d2d 100644 --- a/agenta-web/src/components/Playground/Views/ParametersView.tsx +++ b/agenta-web/src/components/Playground/Views/ParametersView.tsx @@ -11,6 +11,7 @@ import {usePostHogAg} from "@/hooks/usePostHogAg" import {isDemo} from "@/lib/helpers/utils" import {useQueryParam} from "@/hooks/useQuery" import {dynamicComponent, dynamicService} from "@/lib/helpers/dynamic" +import {checkIfResourceValidForDeletion} from "@/lib/helpers/evaluate" const PromptVersioningDrawer: any = dynamicComponent( `PromptVersioningDrawer/PromptVersioningDrawer`, @@ -131,14 +132,24 @@ const ParametersView: React.FC = ({ }) } - const handleDelete = () => { - deleteVariant(() => { - if (variant.persistent) { - return deleteSingleVariant(variant.variantId).then(() => { - onStateChange(false) - }) - } - }) + const handleDelete = async () => { + try { + if ( + !(await checkIfResourceValidForDeletion({ + resourceType: "variant", + resourceIds: [variant.variantId], + })) + ) + return + + deleteVariant(() => { + if (variant.persistent) { + return deleteSingleVariant(variant.variantId).then(() => { + onStateChange(false) + }) + } + }) + } catch {} } useEffect(() => { diff --git a/agenta-web/src/components/Playground/Views/TestView.tsx b/agenta-web/src/components/Playground/Views/TestView.tsx index 4d9cb7ae61..cf39a95834 100644 --- a/agenta-web/src/components/Playground/Views/TestView.tsx +++ b/agenta-web/src/components/Playground/Views/TestView.tsx @@ -2,7 +2,15 @@ import React, {useContext, useEffect, useRef, useState} from "react" import {Button, Input, Card, Row, Col, Space, Form, Modal} from "antd" import {CaretRightOutlined, CloseCircleOutlined, PlusOutlined} from "@ant-design/icons" import {callVariant} from "@/services/api" -import {ChatMessage, ChatRole, GenericObject, JSSTheme, Parameter, Variant} from "@/lib/Types" +import { + ChatMessage, + ChatRole, + GenericObject, + JSSTheme, + Parameter, + Variant, + StyleProps, +} from "@/lib/Types" import {batchExecute, randString, removeKeys} from "@/lib/helpers/utils" import LoadTestsModal from "../LoadTestsModal" import AddToTestSetDrawer from "../AddToTestSetDrawer/AddToTestSetDrawer" @@ -30,10 +38,6 @@ const promptRevision: any = dynamicService("promptVersioning/api") dayjs.extend(relativeTime) dayjs.extend(duration) -type StyleProps = { - themeMode: "dark" | "light" -} - const {TextArea} = Input const LOADING_TEXT = "Loading..." diff --git a/agenta-web/src/components/SecondaryButton/SecondaryButton.tsx b/agenta-web/src/components/SecondaryButton/SecondaryButton.tsx index b67380055f..0a6b78e5a5 100644 --- a/agenta-web/src/components/SecondaryButton/SecondaryButton.tsx +++ b/agenta-web/src/components/SecondaryButton/SecondaryButton.tsx @@ -10,10 +10,6 @@ type SecondaryBtnProps = { onClick: () => void } -type StyleProps = { - themeMode: "dark" | "light" -} - const SecondaryButton: React.FC = ({children, ...props}) => { const {appTheme} = useAppTheme() diff --git a/agenta-web/src/components/TestSetTable/TestsetTable.tsx b/agenta-web/src/components/TestSetTable/TestsetTable.tsx index dad1316aca..936fc8189d 100644 --- a/agenta-web/src/components/TestSetTable/TestsetTable.tsx +++ b/agenta-web/src/components/TestSetTable/TestsetTable.tsx @@ -155,6 +155,8 @@ const TestsetTable: React.FC = ({mode}) => { const [columnDefs, setColumnDefs] = useState<{field: string; [key: string]: any}[]>([]) const [inputValues, setInputValues] = useStateCallback(columnDefs.map((col) => col.field)) const [focusedRowData, setFocusedRowData] = useState() + const [writeMode, setWriteMode] = useState(mode) + const [testsetId, setTestsetId] = useState(undefined) const gridRef = useRef(null) const [selectedRow, setSelectedRow] = useState([]) @@ -202,7 +204,7 @@ const TestsetTable: React.FC = ({mode}) => { ADD_BUTTON_COL, ] setColumnDefs(newColDefs) - if (mode === "create") { + if (writeMode === "create") { const initialRowData = Array(3).fill({}) const separateRowData = initialRowData.map(() => { return colData.reduce((acc, curr) => ({...acc, [curr.field]: ""}), {}) @@ -213,7 +215,7 @@ const TestsetTable: React.FC = ({mode}) => { setInputValues(newColDefs.filter((col) => !!col.field).map((col) => col.field)) } - if (mode === "edit" && testset_id) { + if (writeMode === "edit" && testset_id) { setLoading(true) fetchTestset(testset_id as string).then((data) => { setTestsetName(data.name) @@ -224,7 +226,7 @@ const TestsetTable: React.FC = ({mode}) => { })), ) }) - } else if (mode === "create" && appId) { + } else if (writeMode === "create" && appId) { setLoading(true) ;(async () => { const backendVariants = await fetchVariants(appId) @@ -238,7 +240,7 @@ const TestsetTable: React.FC = ({mode}) => { applyColData([]) }) } - }, [mode, testset_id, appId]) + }, [writeMode, testset_id, appId]) const updateTable = (inputValues: string[]) => { const dataColumns = columnDefs.filter((colDef) => colDef.field !== "") @@ -444,22 +446,28 @@ const TestsetTable: React.FC = ({mode}) => { mssgModal("success", "Changes saved successfully!") }) setIsLoading(false) + setWriteMode("edit") } } - if (mode === "create") { + if (writeMode === "create") { if (!testsetName) { setIsModalOpen(true) setIsLoading(false) } else { const response = await createNewTestset(appId, testsetName, rowData) afterSave(response) + setTestsetId(response.data.id) } - } else if (mode === "edit") { + } else if (writeMode === "edit") { if (!testsetName) { setIsModalOpen(true) } else { - const response = await updateTestset(testset_id as string, testsetName, rowData) + const response = await updateTestset( + (testsetId || testset_id) as string, + testsetName, + rowData, + ) afterSave(response) } } diff --git a/agenta-web/src/components/pages/evaluations/EvaluationErrorProps/EvaluationErrorModal.tsx b/agenta-web/src/components/pages/evaluations/EvaluationErrorProps/EvaluationErrorModal.tsx new file mode 100644 index 0000000000..4262353296 --- /dev/null +++ b/agenta-web/src/components/pages/evaluations/EvaluationErrorProps/EvaluationErrorModal.tsx @@ -0,0 +1,61 @@ +import {JSSTheme} from "@/lib/Types" +import {ExclamationCircleOutlined} from "@ant-design/icons" +import {Modal, Typography} from "antd" +import React from "react" +import {createUseStyles} from "react-jss" + +interface EvaluationErrorModalProps { + isErrorModalOpen: boolean + setIsErrorModalOpen: (value: React.SetStateAction) => void + modalErrorMsg: { + message: string + stackTrace: string + } +} + +const useStyles = createUseStyles((theme: JSSTheme) => ({ + errModalStackTrace: { + "& code": { + display: "block", + }, + }, +})) + +const EvaluationErrorModal = ({ + isErrorModalOpen, + setIsErrorModalOpen, + modalErrorMsg, +}: EvaluationErrorModalProps) => { + const classes = useStyles() + + return ( + + + Error + + } + onCancel={() => setIsErrorModalOpen(false)} + > + + Failed to invoke the LLM application with the following exception: + + {modalErrorMsg.message && ( + + {modalErrorMsg.message} + + )} + {modalErrorMsg.stackTrace && ( + + {modalErrorMsg.stackTrace} + + )} + + ) +} + +export default EvaluationErrorModal diff --git a/agenta-web/src/components/pages/evaluations/EvaluationErrorProps/EvaluationErrorText.tsx b/agenta-web/src/components/pages/evaluations/EvaluationErrorProps/EvaluationErrorText.tsx new file mode 100644 index 0000000000..d08a41261f --- /dev/null +++ b/agenta-web/src/components/pages/evaluations/EvaluationErrorProps/EvaluationErrorText.tsx @@ -0,0 +1,25 @@ +import {Button, Typography} from "antd" +import React from "react" + +interface EvaluationErrorTextProps { + text: string + setIsErrorModalOpen: (value: React.SetStateAction) => void +} + +const EvaluationErrorText = ({text, setIsErrorModalOpen}: EvaluationErrorTextProps) => { + return ( + + {text}{" "} + + + ) +} + +export default EvaluationErrorText diff --git a/agenta-web/src/components/pages/evaluations/cellRenderers/cellRenderers.tsx b/agenta-web/src/components/pages/evaluations/cellRenderers/cellRenderers.tsx index 61a5195dd5..1ce610bff2 100644 --- a/agenta-web/src/components/pages/evaluations/cellRenderers/cellRenderers.tsx +++ b/agenta-web/src/components/pages/evaluations/cellRenderers/cellRenderers.tsx @@ -21,6 +21,7 @@ import Link from "next/link" import React, {useCallback, useEffect, useState} from "react" import {createUseStyles} from "react-jss" import {getTypedValue} from "@/lib/helpers/evaluate" +import EvaluationErrorText from "../EvaluationErrorProps/EvaluationErrorText" dayjs.extend(relativeTime) dayjs.extend(duration) @@ -137,47 +138,75 @@ export function LongTextCellRenderer(params: ICellRendererParams, output?: any) } export const ResultRenderer = React.memo( - (params: ICellRendererParams<_EvaluationScenario> & {config: EvaluatorConfig}) => { + ( + params: ICellRendererParams<_EvaluationScenario> & { + config: EvaluatorConfig + setIsErrorModalOpen: React.Dispatch> + setModalErrorMsg: React.Dispatch< + React.SetStateAction<{ + message: string + stackTrace: string + }> + > + }, + ) => { + const {setIsErrorModalOpen, setModalErrorMsg} = params const result = params.data?.results.find( (item) => item.evaluator_config === params.config.id, )?.result - let errorMsg = "" - if (result?.type === "error") { - errorMsg = `${result?.error?.message}\n${result?.error?.stacktrace}` + if (result?.type === "error" && result.error) { + setModalErrorMsg({message: result.error.message, stackTrace: result.error.stacktrace}) } - return ( - - {errorMsg || getTypedValue(result)} - + return result?.type === "error" && result.error ? ( + + ) : ( + {getTypedValue(result)} ) }, (prev, next) => prev.value === next.value, ) export const runningStatuses = [EvaluationStatus.INITIALIZED, EvaluationStatus.STARTED] -export const statusMapper = (token: GlobalToken) => ({ - [EvaluationStatus.INITIALIZED]: { - label: "Queued", - color: token.colorTextSecondary, - }, - [EvaluationStatus.STARTED]: { - label: "Running", - color: token.colorWarning, - }, - [EvaluationStatus.FINISHED]: { - label: "Completed", - color: token.colorSuccess, - }, - [EvaluationStatus.ERROR]: { - label: "Failed", - color: token.colorError, - }, - [EvaluationStatus.FINISHED_WITH_ERRORS]: { - label: "Completed with Errors", - color: token.colorWarning, - }, -}) +export const statusMapper = (token: GlobalToken) => (status: EvaluationStatus) => { + const statusMap = { + [EvaluationStatus.INITIALIZED]: { + label: "Queued", + color: token.colorTextSecondary, + }, + [EvaluationStatus.STARTED]: { + label: "Running", + color: token.colorWarning, + }, + [EvaluationStatus.FINISHED]: { + label: "Completed", + color: token.colorSuccess, + }, + [EvaluationStatus.ERROR]: { + label: "Failed", + color: token.colorError, + }, + [EvaluationStatus.FINISHED_WITH_ERRORS]: { + label: "Completed with Errors", + color: token.colorWarning, + }, + [EvaluationStatus.AGGREGATION_FAILED]: { + label: "Result Aggregation Failed", + color: token.colorWarning, + }, + } + + return ( + statusMap[status] || { + label: "Unknown", + color: "purple", + } + ) +} + export const StatusRenderer = React.memo( (params: ICellRendererParams<_Evaluation>) => { const classes = useStyles() @@ -186,8 +215,9 @@ export const StatusRenderer = React.memo( params.data?.duration || 0, runningStatuses.includes(params.value), ) - const {label, color} = statusMapper(token)[params.data?.status.value as EvaluationStatus] + const {label, color} = statusMapper(token)(params.data?.status.value as EvaluationStatus) const errorMsg = params.data?.status.error?.message + const errorStacktrace = params.data?.status.error?.stacktrace return ( @@ -195,7 +225,7 @@ export const StatusRenderer = React.memo( {label} {errorMsg && ( - + diff --git a/agenta-web/src/components/pages/evaluations/evaluationCompare/EvaluationCompare.tsx b/agenta-web/src/components/pages/evaluations/evaluationCompare/EvaluationCompare.tsx index 0069b9fe0b..02a9538fe0 100644 --- a/agenta-web/src/components/pages/evaluations/evaluationCompare/EvaluationCompare.tsx +++ b/agenta-web/src/components/pages/evaluations/evaluationCompare/EvaluationCompare.tsx @@ -8,8 +8,8 @@ import { _Evaluation, _EvaluationScenario, } from "@/lib/Types" +import {ColDef, ValueGetterParams} from "ag-grid-community" import {fetchAllComparisonResults} from "@/services/evaluations/api" -import {ColDef} from "ag-grid-community" import {AgGridReact} from "ag-grid-react" import {Button, DropdownProps, Space, Spin, Tag, Tooltip, Typography} from "antd" import React, {useEffect, useMemo, useRef, useState} from "react" @@ -30,6 +30,8 @@ import FilterColumns, {generateFilterItems} from "../FilterColumns/FilterColumns import _ from "lodash" import {variantNameWithRev} from "@/lib/helpers/variantHelper" import {escapeNewlines} from "@/lib/helpers/fileManipulations" +import EvaluationErrorModal from "../EvaluationErrorProps/EvaluationErrorModal" +import EvaluationErrorText from "../EvaluationErrorProps/EvaluationErrorText" const useStyles = createUseStyles((theme: JSSTheme) => ({ table: { @@ -88,6 +90,8 @@ const EvaluationCompareMode: React.FC = () => { const [isFilterColsDropdownOpen, setIsFilterColsDropdownOpen] = useState(false) const [isDiffDropdownOpen, setIsDiffDropdownOpen] = useState(false) const [selectedCorrectAnswer, setSelectedCorrectAnswer] = useState(["noDiffColumnIsSelected"]) + const [modalErrorMsg, setModalErrorMsg] = useState({message: "", stackTrace: ""}) + const [isErrorModalOpen, setIsErrorModalOpen] = useState(false) const handleOpenChangeDiff: DropdownProps["onOpenChange"] = (nextOpen, info) => { if (info.source === "trigger" || nextOpen) { @@ -184,38 +188,42 @@ const EvaluationCompareMode: React.FC = () => { ), headerName: "Output", - minWidth: 280, + minWidth: 300, flex: 1, field: `variants.${vi}.output` as any, ...getFilterParams("text"), hide: hiddenVariants.includes("Output"), cellRenderer: (params: any) => { + const result = params.data?.variants.find( + (item: any) => item.evaluationId === variant.evaluationId, + )?.output?.result + + if (result && result.error && result.type == "error") { + setModalErrorMsg({ + message: result.error.message, + stackTrace: result.error.stacktrace, + }) + return ( + + ) + } + return ( <> {selectedCorrectAnswer[0] !== "noDiffColumnIsSelected" ? LongTextCellRenderer( params, - item.evaluationId === variant.evaluationId, - )?.output?.result, - )} + variantOutput={getTypedValue(result)} expectedOutput={ params.data[selectedCorrectAnswer[0]] || "" } />, ) - : LongTextCellRenderer( - params, - getTypedValue( - params.data?.variants.find( - (item: any) => - item.evaluationId === variant.evaluationId, - )?.output?.result, - ), - )} + : LongTextCellRenderer(params, getTypedValue(result))} ) }, @@ -266,6 +274,29 @@ const EvaluationCompareMode: React.FC = () => { field: "variants.0.evaluatorConfigs.0.result" as any, ...getFilterParams("text"), hide: hiddenVariants.includes(config.name), + cellRenderer: (params: ValueGetterParams) => { + const result = params.data?.variants + .find((item) => item.evaluationId === variant.evaluationId) + ?.evaluatorConfigs.find( + (item) => item.evaluatorConfig.id === config.id, + )?.result + + if (result?.error && result.type === "error") { + setModalErrorMsg({ + message: result.error.message, + stackTrace: result.error.stacktrace, + }) + } + + return result?.type === "error" && result.error ? ( + + ) : ( + {getTypedValue(result)} + ) + }, valueGetter: (params) => { return getTypedValue( params.data?.variants @@ -544,11 +575,17 @@ const EvaluationCompareMode: React.FC = () => { ref={gridRef as any} rowData={rows} columnDefs={colDefs} - getRowId={(params) => params.data.id} + getRowId={(params) => params.data.rowId} headerHeight={64} /> + + ) } diff --git a/agenta-web/src/components/pages/evaluations/evaluationResults/EvaluationResults.tsx b/agenta-web/src/components/pages/evaluations/evaluationResults/EvaluationResults.tsx index ee243c0eee..427c743e4b 100644 --- a/agenta-web/src/components/pages/evaluations/evaluationResults/EvaluationResults.tsx +++ b/agenta-web/src/components/pages/evaluations/evaluationResults/EvaluationResults.tsx @@ -1,9 +1,9 @@ import React, {useEffect, useMemo, useRef, useState} from "react" import {AgGridReact} from "ag-grid-react" import {useAppTheme} from "@/components/Layout/ThemeContextProvider" -import {ColDef} from "ag-grid-community" +import {ColDef, ValueGetterParams} from "ag-grid-community" import {createUseStyles} from "react-jss" -import {Button, DropdownProps, Space, Spin, Tag, Tooltip, theme} from "antd" +import {Button, DropdownProps, Space, Spin, Tag, Tooltip, Typography, theme} from "antd" import { DeleteOutlined, DownloadOutlined, @@ -274,6 +274,19 @@ const EvaluationResults: React.FC = () => { ), autoHeaderHeight: true, ...getFilterParams("number"), + cellRenderer: (params: ValueGetterParams<_Evaluation, any>) => { + const result = params.data?.aggregated_results.find( + (item) => item.evaluator_config.id === config.id, + )?.result + + return result?.error ? ( + + Error + + ) : ( + {getTypedValue(result)} + ) + }, valueGetter: (params) => getTypedValue( params.data?.aggregated_results.find( @@ -295,10 +308,10 @@ const EvaluationResults: React.FC = () => { pinned: "right", ...getFilterParams("text"), filterValueGetter: (params) => - statusMapper(token)[params.data?.status.value as EvaluationStatus].label, + statusMapper(token)(params.data?.status.value as EvaluationStatus).label, cellRenderer: StatusRenderer, valueGetter: (params) => - statusMapper(token)[params.data?.status.value as EvaluationStatus].label, + statusMapper(token)(params.data?.status.value as EvaluationStatus).label, }, { flex: 1, @@ -393,7 +406,7 @@ const EvaluationResults: React.FC = () => { "Avg. Latency": getTypedValue(item.average_latency), "Total Cost": getTypedValue(item.average_cost), Created: formatDate24(item.created_at), - Status: statusMapper(token)[item.status.value as EvaluationStatus].label, + Status: statusMapper(token)(item.status.value as EvaluationStatus).label, })), colDefs.map((col) => col.headerName!), ) @@ -487,6 +500,8 @@ const EvaluationResults: React.FC = () => { return ;(EvaluationStatus.FINISHED === params.data?.status.value || EvaluationStatus.FINISHED_WITH_ERRORS === + params.data?.status.value || + EvaluationStatus.AGGREGATION_FAILED === params.data?.status.value) && router.push( `/apps/${appId}/evaluations/results/${params.data?.id}`, diff --git a/agenta-web/src/components/pages/evaluations/evaluationScenarios/EvaluationScenarios.tsx b/agenta-web/src/components/pages/evaluations/evaluationScenarios/EvaluationScenarios.tsx index 573491eb47..faed3b0168 100644 --- a/agenta-web/src/components/pages/evaluations/evaluationScenarios/EvaluationScenarios.tsx +++ b/agenta-web/src/components/pages/evaluations/evaluationScenarios/EvaluationScenarios.tsx @@ -23,6 +23,8 @@ import {useAtom} from "jotai" import {evaluatorsAtom} from "@/lib/atoms/evaluation" import CompareOutputDiff from "@/components/CompareOutputDiff/CompareOutputDiff" import {formatCurrency, formatLatency} from "@/lib/helpers/formatters" +import EvaluationErrorModal from "../EvaluationErrorProps/EvaluationErrorModal" +import EvaluationErrorText from "../EvaluationErrorProps/EvaluationErrorText" import _ from "lodash" import FilterColumns, {generateFilterItems} from "../FilterColumns/FilterColumns" import {variantNameWithRev} from "@/lib/helpers/variantHelper" @@ -80,6 +82,8 @@ const EvaluationScenarios: React.FC = () => { scenarios[0]?.correct_answers || [], "key", ) + const [modalErrorMsg, setModalErrorMsg] = useState({message: "", stackTrace: ""}) + const [isErrorModalOpen, setIsErrorModalOpen] = useState(false) const colDefs = useMemo(() => { const colDefs: ColDef<_EvaluationScenario>[] = [] @@ -143,12 +147,18 @@ const EvaluationScenarios: React.FC = () => { const correctAnswer = params?.data?.correct_answers?.find( (item: any) => item.key === selectedCorrectAnswer[0], ) - const result = params.data?.outputs[index].result - if (result && result.type == "error") { - return LongTextCellRenderer( - params, - `${result?.error?.message}\n${result?.error?.stacktrace}`, + + if (result && result.error && result.type == "error") { + setModalErrorMsg({ + message: result.error.message, + stackTrace: result.error.stacktrace, + }) + return ( + ) } return selectedCorrectAnswer[0] !== "noDiffColumnIsSelected" @@ -188,6 +198,8 @@ const EvaluationScenarios: React.FC = () => { cellRenderer: ResultRenderer, cellRendererParams: { config, + setIsErrorModalOpen, + setModalErrorMsg, }, valueGetter: (params) => { return params.data?.results[index].result.value @@ -292,7 +304,7 @@ const EvaluationScenarios: React.FC = () => { message: "Are you sure you want to delete this evaluation?", onOk: () => deleteEvaluations([evaluationId]) - .then(() => router.push(`/apps/${appId}/evaluations`)) + .then(() => router.push(`/apps/${appId}/evaluations/results`)) .catch(console.error), }) } @@ -390,6 +402,12 @@ const EvaluationScenarios: React.FC = () => { /> + + ) } diff --git a/agenta-web/src/components/pages/settings/Secrets/Secrets.tsx b/agenta-web/src/components/pages/settings/Secrets/Secrets.tsx index ab90134f2d..b28f7dbf68 100644 --- a/agenta-web/src/components/pages/settings/Secrets/Secrets.tsx +++ b/agenta-web/src/components/pages/settings/Secrets/Secrets.tsx @@ -4,42 +4,20 @@ import { removeSingleLlmProviderKey, getAllProviderLlmKeys, LlmProvider, - getApikeys, } from "@/lib/helpers/llmProviders" import {Button, Input, Space, Typography, message} from "antd" import {useState} from "react" -import {createUseStyles} from "react-jss" const {Title, Text} = Typography -const useStyles = createUseStyles({ - title: { - marginTop: 0, - }, - container: { - marginLeft: 0, - }, - apiContainer: { - margin: "0px 0", - }, - input: { - display: "flex", - alignItems: "center", - width: 420, - marginBottom: 8, - marginLeft: 8, - }, -}) - export default function Secrets() { - const classes = useStyles() const [llmProviderKeys, setLlmProviderKeys] = useState(getAllProviderLlmKeys()) const [messageAPI, contextHolder] = message.useMessage() return (
{contextHolder} - + <Title level={3} className={"mt-0"}> LLM Keys @@ -48,50 +26,50 @@ export default function Secrets() { servers! -
+
Available Providers -
+
{llmProviderKeys.map(({title, key}: LlmProvider, i: number) => ( -
- - { - const newLlmProviderKeys = [...llmProviderKeys] - newLlmProviderKeys[i].key = e.target.value - setLlmProviderKeys(newLlmProviderKeys) - }} - addonBefore={`${title}`} - visibilityToggle={false} - className={classes.input} - /> - - + - -
+ messageAPI.warning("The secret is deleted") + }} + > + Delete + + ))}
diff --git a/agenta-web/src/lib/Types.ts b/agenta-web/src/lib/Types.ts index c742244029..322da212a2 100644 --- a/agenta-web/src/lib/Types.ts +++ b/agenta-web/src/lib/Types.ts @@ -377,6 +377,7 @@ export enum EvaluationStatus { FINISHED = "EVALUATION_FINISHED", FINISHED_WITH_ERRORS = "EVALUATION_FINISHED_WITH_ERRORS", ERROR = "EVALUATION_FAILED", + AGGREGATION_FAILED = "EVALUATION_AGGREGATION_FAILED", } export enum EvaluationStatusType { @@ -488,3 +489,7 @@ export type PaginationQuery = { page: number pageSize: number } + +export type StyleProps = { + themeMode: "dark" | "light" +} diff --git a/agenta-web/src/lib/helpers/evaluate.ts b/agenta-web/src/lib/helpers/evaluate.ts index 60b8975cc2..87b5b48f4b 100644 --- a/agenta-web/src/lib/helpers/evaluate.ts +++ b/agenta-web/src/lib/helpers/evaluate.ts @@ -333,10 +333,20 @@ const getCustomComparator = (type: CellDataType) => (valueA: string, valueB: str const num = parseFloat(val || "0") return isNaN(num) ? 0 : num } - if (type === "date") return dayjs(valueA).diff(dayjs(valueB)) - if (type === "text") return valueA.localeCompare(valueB) - if (type === "number") return getNumber(valueA) - getNumber(valueB) - return 0 + + valueA = String(valueA) + valueB = String(valueB) + + switch (type) { + case "date": + return dayjs(valueA).diff(dayjs(valueB)) + case "text": + return valueA.localeCompare(valueB) + case "number": + return getNumber(valueA) - getNumber(valueB) + default: + return 0 + } } export const removeCorrectAnswerPrefix = (str: string) => { diff --git a/agenta-web/src/pages/apps/[app_id]/testsets/[testset_id]/index.tsx b/agenta-web/src/pages/apps/[app_id]/testsets/[testset_id]/index.tsx index 7518ed7d0f..9e9a4a47bf 100644 --- a/agenta-web/src/pages/apps/[app_id]/testsets/[testset_id]/index.tsx +++ b/agenta-web/src/pages/apps/[app_id]/testsets/[testset_id]/index.tsx @@ -1,4 +1,4 @@ -import React, {useState, useEffect} from "react" +import React from "react" import TestsetTable from "@/components/TestSetTable/TestsetTable" const testsetDisplay = () => { diff --git a/agenta-web/src/pages/apps/[app_id]/testsets/new/api/index.tsx b/agenta-web/src/pages/apps/[app_id]/testsets/new/api/index.tsx index 43c03465b1..829788c3e1 100644 --- a/agenta-web/src/pages/apps/[app_id]/testsets/new/api/index.tsx +++ b/agenta-web/src/pages/apps/[app_id]/testsets/new/api/index.tsx @@ -1,12 +1,12 @@ import DynamicCodeBlock from "@/components/DynamicCodeBlock/DynamicCodeBlock" -import pythonCode from "../../../../../../code_snippets/testsets/create_with_json/python" -import cURLCode from "../../../../../../code_snippets/testsets/create_with_json/curl" -import tsCode from "../../../../../../code_snippets/testsets/create_with_json/typescript" +import pythonCode from "@/code_snippets/testsets/create_with_json/python" +import cURLCode from "@/code_snippets/testsets/create_with_json/curl" +import tsCode from "@/code_snippets/testsets/create_with_json/typescript" -import pythonCodeUpload from "../../../../../../code_snippets/testsets/create_with_upload/python" -import cURLCodeUpload from "../../../../../../code_snippets/testsets/create_with_upload/curl" -import tsCodeUpload from "../../../../../../code_snippets/testsets/create_with_upload/typescript" +import pythonCodeUpload from "@/code_snippets/testsets/create_with_upload/python" +import cURLCodeUpload from "@/code_snippets/testsets/create_with_upload/curl" +import tsCodeUpload from "@/code_snippets/testsets/create_with_upload/typescript" import {Typography} from "antd" import {useRouter} from "next/router" import {createUseStyles} from "react-jss" diff --git a/agenta-web/src/services/evaluations/api/index.ts b/agenta-web/src/services/evaluations/api/index.ts index 301bf6c994..98612464d8 100644 --- a/agenta-web/src/services/evaluations/api/index.ts +++ b/agenta-web/src/services/evaluations/api/index.ts @@ -14,6 +14,7 @@ import { } from "@/lib/Types" import {getTagColors} from "@/lib/helpers/colors" import {stringToNumberInRange} from "@/lib/helpers/utils" +import {v4 as uuidv4} from "uuid" import exactMatchImg from "@/media/target.png" import similarityImg from "@/media/transparency.png" import regexImg from "@/media/programming.png" @@ -239,6 +240,7 @@ export const fetchAllComparisonResults = async (evaluationIds: string[]) => { rows.push({ id: inputValuesStr, + rowId: uuidv4(), inputs: inputNames .map((name) => ({name, value: data[name]})) .filter((ip) => ip.value !== undefined), diff --git a/cookbook/evaluations_with_sdk.ipynb b/cookbook/evaluations_with_sdk.ipynb new file mode 100644 index 0000000000..dd32af728b --- /dev/null +++ b/cookbook/evaluations_with_sdk.ipynb @@ -0,0 +1,470 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Using evaluations with the SDK\n", + "In this cookbook we will show how to interact with evaluation in agenta programatically. Either using the SDK (or the raw API). \n", + "\n", + "We will do the following:\n", + "\n", + "- Create a test set\n", + "- Create and configure an evaluator\n", + "- Run an evaluation\n", + "- Retrieve the results of evaluations\n", + "\n", + "We assume that you have already created an LLM application and variants in agenta. \n", + "\n", + "\n", + "### Architectural Overview:\n", + "In this scenario, evaluations are executed on the Agenta backend. Specifically, Agenta invokes the LLM application for each row in the test set and subsequently processes the output using the designated evaluator. \n", + "This operation is managed through Celery tasks. The interactions with the LLM application are asynchronous, batched, and include retry mechanisms. Additionally, the batching configuration can be adjusted to avoid exceeding the rate limits imposed by the LLM provider.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup " + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: agenta in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (0.17.5)\n", + "Requirement already satisfied: docker<8.0.0,>=6.1.1 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from agenta) (6.1.2)\n", + "Requirement already satisfied: posthog<4.0.0,>=3.1.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from agenta) (3.3.1)\n", + "Requirement already satisfied: click<9.0.0,>=8.1.3 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from agenta) (8.1.3)\n", + "Requirement already satisfied: importlib-metadata<8.0,>=6.7 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from agenta) (6.8.0)\n", + "Requirement already satisfied: pymongo<5.0.0,>=4.6.3 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from agenta) (4.7.2)\n", + "Requirement already satisfied: questionary<3.0,>=1.10 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from agenta) (1.10.0)\n", + "Requirement already satisfied: httpx<0.28,>=0.24 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from agenta) (0.27.0)\n", + "Requirement already satisfied: python-dotenv<2.0.0,>=1.0.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from agenta) (1.0.0)\n", + "Requirement already satisfied: pydantic>=2 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from agenta) (2.7.4)\n", + "Requirement already satisfied: toml<0.11.0,>=0.10.2 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from agenta) (0.10.2)\n", + "Requirement already satisfied: cachetools<6.0.0,>=5.3.3 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from agenta) (5.3.3)\n", + "Requirement already satisfied: python-multipart<0.0.10,>=0.0.6 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from agenta) (0.0.9)\n", + "Requirement already satisfied: fastapi>=0.100.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from agenta) (0.111.0)\n", + "Requirement already satisfied: ipdb>=0.13 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from agenta) (0.13.13)\n", + "Requirement already satisfied: urllib3>=1.26.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from docker<8.0.0,>=6.1.1->agenta) (2.2.1)\n", + "Requirement already satisfied: requests>=2.26.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from docker<8.0.0,>=6.1.1->agenta) (2.31.0)\n", + "Requirement already satisfied: packaging>=14.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from docker<8.0.0,>=6.1.1->agenta) (23.2)\n", + "Requirement already satisfied: websocket-client>=0.32.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from docker<8.0.0,>=6.1.1->agenta) (1.5.2)\n", + "Requirement already satisfied: orjson>=3.2.1 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from fastapi>=0.100.0->agenta) (3.9.15)\n", + "Requirement already satisfied: typing-extensions>=4.8.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from fastapi>=0.100.0->agenta) (4.9.0)\n", + "Requirement already satisfied: ujson!=4.0.2,!=4.1.0,!=4.2.0,!=4.3.0,!=5.0.0,!=5.1.0,>=4.0.1 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from fastapi>=0.100.0->agenta) (5.10.0)\n", + "Requirement already satisfied: jinja2>=2.11.2 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from fastapi>=0.100.0->agenta) (3.1.2)\n", + "Requirement already satisfied: starlette<0.38.0,>=0.37.2 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from fastapi>=0.100.0->agenta) (0.37.2)\n", + "Requirement already satisfied: uvicorn[standard]>=0.12.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from fastapi>=0.100.0->agenta) (0.20.0)\n", + "Requirement already satisfied: email_validator>=2.0.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from fastapi>=0.100.0->agenta) (2.1.1)\n", + "Requirement already satisfied: fastapi-cli>=0.0.2 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from fastapi>=0.100.0->agenta) (0.0.4)\n", + "Requirement already satisfied: idna>=2.0.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from email_validator>=2.0.0->fastapi>=0.100.0->agenta) (3.2)\n", + "Requirement already satisfied: dnspython>=2.0.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from email_validator>=2.0.0->fastapi>=0.100.0->agenta) (2.2.1)\n", + "Requirement already satisfied: typer>=0.12.3 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from fastapi-cli>=0.0.2->fastapi>=0.100.0->agenta) (0.12.3)\n", + "Requirement already satisfied: httpcore==1.* in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from httpx<0.28,>=0.24->agenta) (1.0.4)\n", + "Requirement already satisfied: anyio in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from httpx<0.28,>=0.24->agenta) (3.6.2)\n", + "Requirement already satisfied: sniffio in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from httpx<0.28,>=0.24->agenta) (1.2.0)\n", + "Requirement already satisfied: certifi in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from httpx<0.28,>=0.24->agenta) (2023.11.17)\n", + "Requirement already satisfied: h11<0.15,>=0.13 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from httpcore==1.*->httpx<0.28,>=0.24->agenta) (0.14.0)\n", + "Requirement already satisfied: zipp>=0.5 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from importlib-metadata<8.0,>=6.7->agenta) (3.6.0)\n", + "Requirement already satisfied: tomli in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from ipdb>=0.13->agenta) (2.0.1)\n", + "Requirement already satisfied: ipython>=7.31.1 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from ipdb>=0.13->agenta) (8.13.2)\n", + "Requirement already satisfied: decorator in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from ipdb>=0.13->agenta) (5.1.0)\n", + "Requirement already satisfied: stack-data in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from ipython>=7.31.1->ipdb>=0.13->agenta) (0.6.2)\n", + "Requirement already satisfied: pexpect>4.3 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from ipython>=7.31.1->ipdb>=0.13->agenta) (4.8.0)\n", + "Requirement already satisfied: prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from ipython>=7.31.1->ipdb>=0.13->agenta) (3.0.38)\n", + "Requirement already satisfied: jedi>=0.16 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from ipython>=7.31.1->ipdb>=0.13->agenta) (0.18.0)\n", + "Requirement already satisfied: pickleshare in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from ipython>=7.31.1->ipdb>=0.13->agenta) (0.7.5)\n", + "Requirement already satisfied: matplotlib-inline in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from ipython>=7.31.1->ipdb>=0.13->agenta) (0.1.2)\n", + "Requirement already satisfied: backcall in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from ipython>=7.31.1->ipdb>=0.13->agenta) (0.2.0)\n", + "Requirement already satisfied: traitlets>=5 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from ipython>=7.31.1->ipdb>=0.13->agenta) (5.1.0)\n", + "Requirement already satisfied: appnope in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from ipython>=7.31.1->ipdb>=0.13->agenta) (0.1.2)\n", + "Requirement already satisfied: pygments>=2.4.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from ipython>=7.31.1->ipdb>=0.13->agenta) (2.10.0)\n", + "Requirement already satisfied: parso<0.9.0,>=0.8.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from jedi>=0.16->ipython>=7.31.1->ipdb>=0.13->agenta) (0.8.2)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from jinja2>=2.11.2->fastapi>=0.100.0->agenta) (2.1.1)\n", + "Requirement already satisfied: ptyprocess>=0.5 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from pexpect>4.3->ipython>=7.31.1->ipdb>=0.13->agenta) (0.7.0)\n", + "Requirement already satisfied: backoff>=1.10.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from posthog<4.0.0,>=3.1.0->agenta) (1.10.0)\n", + "Requirement already satisfied: python-dateutil>2.1 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from posthog<4.0.0,>=3.1.0->agenta) (2.8.2)\n", + "Requirement already satisfied: monotonic>=1.5 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from posthog<4.0.0,>=3.1.0->agenta) (1.6)\n", + "Requirement already satisfied: six>=1.5 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from posthog<4.0.0,>=3.1.0->agenta) (1.16.0)\n", + "Requirement already satisfied: wcwidth in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30->ipython>=7.31.1->ipdb>=0.13->agenta) (0.2.5)\n", + "Requirement already satisfied: annotated-types>=0.4.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from pydantic>=2->agenta) (0.5.0)\n", + "Requirement already satisfied: pydantic-core==2.18.4 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from pydantic>=2->agenta) (2.18.4)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from requests>=2.26.0->docker<8.0.0,>=6.1.1->agenta) (2.0.4)\n", + "Requirement already satisfied: rich>=10.11.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from typer>=0.12.3->fastapi-cli>=0.0.2->fastapi>=0.100.0->agenta) (12.6.0)\n", + "Requirement already satisfied: shellingham>=1.3.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from typer>=0.12.3->fastapi-cli>=0.0.2->fastapi>=0.100.0->agenta) (1.5.4)\n", + "Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from rich>=10.11.0->typer>=0.12.3->fastapi-cli>=0.0.2->fastapi>=0.100.0->agenta) (0.9.1)\n", + "Requirement already satisfied: httptools>=0.5.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from uvicorn[standard]>=0.12.0->fastapi>=0.100.0->agenta) (0.6.1)\n", + "Requirement already satisfied: watchfiles>=0.13 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from uvicorn[standard]>=0.12.0->fastapi>=0.100.0->agenta) (0.22.0)\n", + "Requirement already satisfied: websockets>=10.4 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from uvicorn[standard]>=0.12.0->fastapi>=0.100.0->agenta) (10.4)\n", + "Requirement already satisfied: uvloop!=0.15.0,!=0.15.1,>=0.14.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from uvicorn[standard]>=0.12.0->fastapi>=0.100.0->agenta) (0.19.0)\n", + "Requirement already satisfied: pyyaml>=5.1 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from uvicorn[standard]>=0.12.0->fastapi>=0.100.0->agenta) (6.0.1)\n", + "Requirement already satisfied: asttokens>=2.1.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from stack-data->ipython>=7.31.1->ipdb>=0.13->agenta) (2.2.1)\n", + "Requirement already satisfied: pure-eval in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from stack-data->ipython>=7.31.1->ipdb>=0.13->agenta) (0.2.2)\n", + "Requirement already satisfied: executing>=1.2.0 in /Users/mahmoudmabrouk/opt/anaconda3/lib/python3.9/site-packages (from stack-data->ipython>=7.31.1->ipdb>=0.13->agenta) (1.2.0)\n" + ] + } + ], + "source": [ + "! pip install -U agenta" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configuration Setup\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[App(app_id='666dde95962bbaffdb0072b5', app_name='product-classification'),\n", + " App(app_id='666fde62962bbaffdb0073d9', app_name='product-title-generation'),\n", + " App(app_id='66704efa962bbaffdb007574', app_name='project-qa'),\n", + " App(app_id='6670570b962bbaffdb0075a7', app_name='project-qa-prompt-rewriting'),\n", + " App(app_id='667d8cfad1812781f7e375d9', app_name='find_capital')]" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Assuming an application has already been created through the user interface, you will need to obtain the application ID.\n", + "# In this example we will use the default template single_prompt which has the prompt \"Determine the capital of {country}\"\n", + "\n", + "# You can find the application ID in the URL. For example, in the URL https://cloud.agenta.ai/apps/666dde95962bbaffdb0072b5/playground?variant=app.default, the application ID is `666dde95962bbaffdb0072b5`.\n", + "from agenta.client.backend.client import AgentaApi\n", + "# Let's list the applications\n", + "client.apps.list_apps()" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "app_id = \"667d8cfad1812781f7e375d9\"\n", + "\n", + "# You can create the API key under the settings page. If you are using the OSS version, you should keep this as an empty string\n", + "api_key = \"EUqJGOUu.xxxx\"\n", + "\n", + "# Host. \n", + "host = \"https://cloud.agenta.ai\"\n", + "\n", + "# Initialize the client\n", + "\n", + "client = AgentaApi(base_url=host + \"/api\", api_key=api_key)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a test set" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'status': 'success',\n", + " 'message': 'testset updated successfully',\n", + " '_id': '667d8ecfd1812781f7e375eb'}" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from agenta.client.backend.types.new_testset import NewTestset\n", + "\n", + "csvdata = [\n", + " {\"country\": \"france\", \"capital\": \"Paris\"},\n", + " {\"country\": \"Germany\", \"capital\": \"paris\"}\n", + " ]\n", + "\n", + "response = client.testsets.create_testset(app_id=app_id, request=NewTestset(name=\"test set\", csvdata=csvdata))\n", + "test_set_id = response.id\n", + "\n", + "# let's now update it\n", + "\n", + "csvdata = [\n", + " {\"country\": \"france\", \"capital\": \"Paris\"},\n", + " {\"country\": \"Germany\", \"capital\": \"Berlin\"}\n", + " ]\n", + "\n", + "client.testsets.update_testset(testset_id=test_set_id, request=NewTestset(name=\"test set\", csvdata=csvdata))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Create evaluators" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "# Create an evaluator that performs an exact match comparison on the 'capital' column\n", + "# You can find the list of evaluator keys and evaluators and their configurations in https://github.com/Agenta-AI/agenta/blob/main/agenta-backend/agenta_backend/resources/evaluators/evaluators.py\n", + "response = client.evaluators.create_new_evaluator_config(app_id=app_id, name=\"capital_evaluator\", evaluator_key=\"auto_exact_match\", settings_values={\"correct_answer_key\": \"capital\"})\n", + "exact_match_eval_id = response.id\n", + "\n", + "code_snippet = \"\"\"\n", + "from typing import Dict\n", + "\n", + "def evaluate(\n", + " app_params: Dict[str, str],\n", + " inputs: Dict[str, str],\n", + " output: str, # output of the llm app\n", + " datapoint: Dict[str, str] # contains the testset row \n", + ") -> float:\n", + " if output and output[0].isupper():\n", + " return 1.0\n", + " else:\n", + " return 0.0\n", + "\"\"\"\n", + "\n", + "response = client.evaluators.create_new_evaluator_config(app_id=app_id, name=\"capital_letter_evaluator\", evaluator_key=\"auto_custom_code_run\", settings_values={\"code\": code_snippet})\n", + "letter_match_eval_id = response.id" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[EvaluatorConfig(id='667d8cfbd1812781f7e375e2', name='Exact Match', evaluator_key='auto_exact_match', settings_values={'correct_answer_key': 'correct_answer'}, created_at=datetime.datetime(2024, 6, 26, 12, 22, 31, 775000), updated_at=datetime.datetime(2024, 6, 26, 12, 22, 31, 775000)),\n", + " EvaluatorConfig(id='667d8cfbd1812781f7e375e3', name='Contains Json', evaluator_key='auto_contains_json', settings_values={}, created_at=datetime.datetime(2024, 6, 26, 12, 22, 31, 775000), updated_at=datetime.datetime(2024, 6, 26, 12, 22, 31, 775000)),\n", + " EvaluatorConfig(id='667d8ed6d1812781f7e375ec', name='capital_evaluator', evaluator_key='auto_exact_match', settings_values={'correct_answer_key': 'capital'}, created_at=datetime.datetime(2024, 6, 26, 12, 22, 31, 775000), updated_at=datetime.datetime(2024, 6, 26, 12, 22, 31, 775000)),\n", + " EvaluatorConfig(id='667d8ed6d1812781f7e375ed', name='capital_letter_evaluator', evaluator_key='auto_custom_code_run', settings_values={'code': '\\nfrom typing import Dict\\n\\ndef evaluate(\\n app_params: Dict[str, str],\\n inputs: Dict[str, str],\\n output: str, # output of the llm app\\n datapoint: Dict[str, str] # contains the testset row \\n) -> float:\\n if output and output[0].isupper():\\n return 1.0\\n else:\\n return 0.0\\n'}, created_at=datetime.datetime(2024, 6, 26, 12, 22, 31, 775000), updated_at=datetime.datetime(2024, 6, 26, 12, 22, 31, 775000))]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# get list of all evaluators\n", + "client.evaluators.get_evaluator_configs(app_id=app_id)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Run an evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[AppVariantResponse(app_id='667d8cfad1812781f7e375d9', app_name='find_capital', variant_id='667d8cfbd1812781f7e375df', variant_name='app.default', parameters={'temperature': 1.0, 'model': 'gpt-3.5-turbo', 'max_tokens': -1, 'prompt_system': 'You are an expert in geography.', 'prompt_user': 'What is the capital of {country}?', 'top_p': 1.0, 'frequence_penalty': 0.0, 'presence_penalty': 0.0, 'force_json': 0}, previous_variant_name=None, user_id='666dde45962bbaffdb0072b2', base_name='app', base_id='667d8cfbd1812781f7e375de', config_name='default', uri='https://vmripsmtbzlysdbptjl4hzrbga0ckadr.lambda-url.eu-central-1.on.aws', revision=1, organization_id='666dde45962bbaffdb0072b3', workspace_id='666dde45962bbaffdb0072b4')]\n" + ] + } + ], + "source": [ + "response = client.apps.list_app_variants(app_id=app_id)\n", + "print(response)\n", + "myvariant_id = response[0].variant_id" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[Evaluation(id='667d98fbd1812781f7e3761a', app_id='667d8cfad1812781f7e375d9', user_id='666dde45962bbaffdb0072b2', user_username='mahmoud+demo', variant_ids=['667d8cfbd1812781f7e375df'], variant_names=['app.default'], variant_revision_ids=['667d8d0dd1812781f7e375e7'], revisions=['1'], testset_id='667d8ecfd1812781f7e375eb', testset_name='test set', status=Result(type='status', value='EVALUATION_STARTED', error=None), aggregated_results=[], average_cost=None, total_cost=None, average_latency=None, created_at=datetime.datetime(2024, 6, 27, 16, 53, 15, 281313, tzinfo=datetime.timezone.utc), updated_at=datetime.datetime(2024, 6, 27, 16, 53, 15, 281328, tzinfo=datetime.timezone.utc))]\n" + ] + } + ], + "source": [ + "# Run an evaluation\n", + "from agenta.client.backend.types.llm_run_rate_limit import LlmRunRateLimit\n", + "response = client.evaluations.create_evaluation(app_id=app_id, variant_ids=[myvariant_id], testset_id=test_set_id, evaluators_configs=[exact_match_eval_id, letter_match_eval_id],\n", + " rate_limit=LlmRunRateLimit(\n", + " batch_size=10, # number of rows to call in parallel\n", + " max_retries=3, # max number of time to retry a failed llm call\n", + " retry_delay=2, # delay before retrying a failed llm call\n", + " delay_between_batches=5, # delay between batches\n", + " ),)\n", + "print(response)" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'status': {'type': 'status', 'value': 'EVALUATION_FINISHED', 'error': None}}" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# check the status\n", + "client.evaluations.fetch_evaluation_status('667d98fbd1812781f7e3761a')" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[('capital_evaluator', {'type': 'number', 'value': 0.0, 'error': None}), ('capital_letter_evaluator', {'type': 'number', 'value': 1.0, 'error': None})]\n" + ] + } + ], + "source": [ + "# fetch the overall results\n", + "response = client.evaluations.fetch_evaluation_results('667d98fbd1812781f7e3761a')\n", + "\n", + "results = [(evaluator[\"evaluator_config\"][\"name\"], evaluator[\"result\"]) for evaluator in response[\"results\"]]\n", + "# End of Selection" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'inputs': [{'input_name': 'country', 'input_value': 'france'},\n", + " {'input_name': 'capital', 'input_value': 'Paris'},\n", + " {'input_name': 'country', 'input_value': 'Germany'},\n", + " {'input_name': 'capital', 'input_value': 'Berlin'}],\n", + " 'data': [{'input_name': 'country',\n", + " 'input_value': 'france',\n", + " 'scenarios': [{'id': '667d994d72010c439240463a',\n", + " 'evaluation_id': '667d98fbd1812781f7e3761a',\n", + " 'inputs': [{'name': 'country', 'type': 'text', 'value': 'france'}],\n", + " 'outputs': [{'result': {'type': 'text',\n", + " 'value': 'The capital of France is Paris.',\n", + " 'error': None},\n", + " 'cost': 5.1500000000000005e-05,\n", + " 'latency': 1.1813}],\n", + " 'evaluation': None,\n", + " 'correct_answers': [{'key': 'capital', 'value': 'Paris'},\n", + " {'key': '', 'value': ''}],\n", + " 'is_pinned': False,\n", + " 'note': '',\n", + " 'results': [{'evaluator_config': '667d8ed6d1812781f7e375ec',\n", + " 'result': {'type': 'bool', 'value': False, 'error': None}},\n", + " {'evaluator_config': '667d8ed6d1812781f7e375ed',\n", + " 'result': {'type': 'number', 'value': 1.0, 'error': None}}]}]},\n", + " {'input_name': 'capital', 'input_value': 'Paris', 'scenarios': []},\n", + " {'input_name': 'country',\n", + " 'input_value': 'Germany',\n", + " 'scenarios': [{'id': '667d994d72010c439240463b',\n", + " 'evaluation_id': '667d98fbd1812781f7e3761a',\n", + " 'inputs': [{'name': 'country', 'type': 'text', 'value': 'Germany'}],\n", + " 'outputs': [{'result': {'type': 'text',\n", + " 'value': 'The capital of Germany is Berlin.',\n", + " 'error': None},\n", + " 'cost': 5.1500000000000005e-05,\n", + " 'latency': 0.9169}],\n", + " 'evaluation': None,\n", + " 'correct_answers': [{'key': 'capital', 'value': 'Berlin'},\n", + " {'key': '', 'value': ''}],\n", + " 'is_pinned': False,\n", + " 'note': '',\n", + " 'results': [{'evaluator_config': '667d8ed6d1812781f7e375ec',\n", + " 'result': {'type': 'bool', 'value': False, 'error': None}},\n", + " {'evaluator_config': '667d8ed6d1812781f7e375ed',\n", + " 'result': {'type': 'number', 'value': 1.0, 'error': None}}]}]},\n", + " {'input_name': 'capital', 'input_value': 'Berlin', 'scenarios': []}]}" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# fetch the detailed results\n", + "client.evaluations.fetch_evaluation_scenarios(evaluations_ids='667d98fbd1812781f7e3761a')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "base", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docker-assets/postgres/init-db.sql b/docker-assets/postgres/init-db.sql new file mode 100644 index 0000000000..9bd429417f --- /dev/null +++ b/docker-assets/postgres/init-db.sql @@ -0,0 +1 @@ +CREATE DATABASE agenta_oss; diff --git a/docker-compose.gh.yml b/docker-compose.gh.yml index 42cfa7433a..8682ca2fea 100644 --- a/docker-compose.gh.yml +++ b/docker-compose.gh.yml @@ -15,7 +15,7 @@ services: container_name: agenta-backend-1 image: ghcr.io/agenta-ai/agenta-backend environment: - - MONGODB_URI=mongodb://username:password@mongo:27017 + - POSTGRES_URI=postgresql+asyncpg://username:password@postgres:5432/agenta_oss - REDIS_URL=redis://redis:6379/0 - ENVIRONMENT=production - DATABASE_MODE=v2 @@ -41,7 +41,7 @@ services: "--log-level", "info", "--root-path", - "/api" + "/api", ] volumes: - /var/run/docker.sock:/var/run/docker.sock @@ -58,7 +58,7 @@ services: extra_hosts: - "host.docker.internal:host-gateway" depends_on: - mongo: + postgres: condition: service_healthy restart: always @@ -80,39 +80,6 @@ services: - NEXT_PUBLIC_TELEMETRY_TRACKING_ENABLED=true restart: always - mongo: - image: mongo:5.0 - environment: - MONGO_INITDB_ROOT_USERNAME: username - MONGO_INITDB_ROOT_PASSWORD: password - volumes: - - mongodb_data:/data/db - ports: - - "27017:27017" - networks: - - agenta-network - healthcheck: - test: [ "CMD", "mongo", "--eval", "db.adminCommand('ping')" ] - interval: 10s - timeout: 10s - retries: 20 - restart: always - - mongo_express: - image: mongo-express - environment: - ME_CONFIG_MONGODB_ADMINUSERNAME: username - ME_CONFIG_MONGODB_ADMINPASSWORD: password - ME_CONFIG_MONGODB_SERVER: mongo - ports: - - "8081:8081" - networks: - - agenta-network - depends_on: - mongo: - condition: service_healthy - restart: always - redis: image: redis:latest networks: @@ -140,16 +107,15 @@ services: command: > celery -A agenta_backend.main.celery_app worker --concurrency=1 --loglevel=INFO environment: - - MONGODB_URI=mongodb://username:password@mongo:27017 + - POSTGRES_URI=postgresql+asyncpg://username:password@postgres:5432/agenta_oss - REDIS_URL=redis://redis:6379/0 - CELERY_BROKER_URL=amqp://guest@rabbitmq// - CELERY_RESULT_BACKEND=redis://redis:6379/0 - FEATURE_FLAG=oss volumes: - - ./agenta-backend/agenta_backend:/app/agenta_backend - /var/run/docker.sock:/var/run/docker.sock depends_on: - - mongo + - postgres - rabbitmq - redis extra_hosts: @@ -157,10 +123,53 @@ services: networks: - agenta-network + postgres: + image: postgres:16.2 + container_name: postgres + restart: always + environment: + POSTGRES_USER: username + POSTGRES_PASSWORD: password + POSTGRES_DB: agenta_oss + ports: + - "5432:5432" + networks: + - agenta-network + volumes: + - postgresdb-data:/var/lib/postgresql/data/ + - ./docker-assets/postgres/init-db.sql:/docker-entrypoint-initdb.d/init-db.sql + healthcheck: + test: ["CMD-SHELL", "pg_isready -U postgres"] + interval: 10s + timeout: 5s + retries: 5 + + pgadmin: + image: dpage/pgadmin4 + restart: always + environment: + PGADMIN_DEFAULT_EMAIL: "admin@example.com" + PGADMIN_DEFAULT_PASSWORD: "password" + PGADMIN_SERVER_HOST: "postgres" + PGADMIN_SERVER_PORT: 5432 + PGADMIN_SERVER_USER: "username" + PGADMIN_SERVER_PASSWORD: "password" + PGADMIN_SERVER_DB: agenta_oss + ports: + - "5050:80" + networks: + - agenta-network + volumes: + - pgadmin-data:/var/lib/pgadmin + depends_on: + postgres: + condition: service_healthy + networks: agenta-network: name: agenta-network volumes: - mongodb_data: redis_data: + postgresdb-data: + pgadmin-data: diff --git a/docker-compose.prod.yml b/docker-compose.prod.yml index 3e7520335c..fd3997f5a7 100644 --- a/docker-compose.prod.yml +++ b/docker-compose.prod.yml @@ -1,164 +1,171 @@ services: - reverse-proxy: - image: traefik:v2.10 - command: --api.insecure=true --providers.docker --entrypoints.web.address=:80 - ports: - - "80:80" - - "8080:8080" - volumes: - - /var/run/docker.sock:/var/run/docker.sock - networks: - - agenta-network - restart: always + reverse-proxy: + image: traefik:v2.10 + command: --api.insecure=true --providers.docker --entrypoints.web.address=:80 + ports: + - "80:80" + - "8080:8080" + volumes: + - /var/run/docker.sock:/var/run/docker.sock + networks: + - agenta-network + restart: always - backend: - build: ./agenta-backend - environment: - - MONGODB_URI=mongodb://username:password@mongo:27017 - - REDIS_URL=redis://redis:6379/0 - - ALLOW_ORIGINS=${ALLOW_ORIGINS} - - ENVIRONMENT=production - - FEATURE_FLAG=oss - - AGENTA_TEMPLATE_REPO=agentaai/templates_v2 - - CELERY_BROKER_URL=amqp://guest@rabbitmq// - - CELERY_RESULT_BACKEND=redis://redis:6379/0 - - TEMPLATES_BASE_URL=https://llm-app-json.s3.eu-central-1.amazonaws.com - - REGISTRY_REPO_NAME=agentaai - - DOCKER_HUB_URL=https://hub.docker.com/v2/repositories - volumes: - - ./agenta-backend/agenta_backend:/app/agenta_backend - - ./agenta-backend/tests:/app/tests - - /var/run/docker.sock:/var/run/docker.sock - - ./agenta-backend/db:/db - labels: - - "traefik.http.routers.backend.rule=Host(`${BARE_DOMAIN_NAME}`) && PathPrefix(`/api/`)" - - "traefik.http.routers.backend.entrypoints=web" - - "traefik.http.middlewares.backend-strip.stripprefix.prefixes=/api" - - "traefik.http.middlewares.backend-strip.stripprefix.forceslash=true" - - "traefik.http.routers.backend.middlewares=backend-strip" - - "traefik.http.services.backend.loadbalancer.server.port=8000" - - "traefik.http.routers.backend.service=backend" - networks: - - agenta-network - extra_hosts: - - "host.docker.internal:host-gateway" - command: - [ - "uvicorn", - "agenta_backend.main:app", - "--host", - "0.0.0.0", - "--port", - "8000", - "--reload", - "--root-path", - "/api" - ] - env_file: - - .env - depends_on: - mongo: - condition: service_healthy - restart: always + backend: + build: ./agenta-backend + environment: + - REDIS_URL=redis://redis:6379/0 + - ALLOW_ORIGINS=${ALLOW_ORIGINS} + - ENVIRONMENT=production + - FEATURE_FLAG=oss + - AGENTA_TEMPLATE_REPO=agentaai/templates_v2 + - CELERY_BROKER_URL=amqp://guest@rabbitmq// + - CELERY_RESULT_BACKEND=redis://redis:6379/0 + - TEMPLATES_BASE_URL=https://llm-app-json.s3.eu-central-1.amazonaws.com + - REGISTRY_REPO_NAME=agentaai + - DOCKER_HUB_URL=https://hub.docker.com/v2/repositories + volumes: + - ./agenta-backend/agenta_backend:/app/agenta_backend + - ./agenta-backend/tests:/app/tests + - /var/run/docker.sock:/var/run/docker.sock + - ./agenta-backend/db:/db + labels: + - "traefik.http.routers.backend.rule=Host(`${BARE_DOMAIN_NAME}`) && PathPrefix(`/api/`)" + - "traefik.http.routers.backend.entrypoints=web" + - "traefik.http.middlewares.backend-strip.stripprefix.prefixes=/api" + - "traefik.http.middlewares.backend-strip.stripprefix.forceslash=true" + - "traefik.http.routers.backend.middlewares=backend-strip" + - "traefik.http.services.backend.loadbalancer.server.port=8000" + - "traefik.http.routers.backend.service=backend" + networks: + - agenta-network + extra_hosts: + - "host.docker.internal:host-gateway" + command: + [ + "uvicorn", + "agenta_backend.main:app", + "--host", + "0.0.0.0", + "--port", + "8000", + "--reload", + "--root-path", + "/api", + ] + env_file: + - .env + depends_on: + postgres: + condition: service_healthy + restart: always - agenta-web: - build: - context: ./agenta-web - dockerfile: prod.Dockerfile - volumes: - - ./agenta-web/src:/app/src - - ./agenta-web/public:/app/public - ports: - - "3000:3000" - networks: - - agenta-network - labels: - - "traefik.http.routers.agenta-web.rule=Host(`${BARE_DOMAIN_NAME}`) && PathPrefix(`/`)" - - "traefik.http.routers.agenta-web.entrypoints=web" - - "traefik.http.services.agenta-web.loadbalancer.server.port=3000" - environment: - - NEXT_PUBLIC_POSTHOG_API_KEY=phc_hmVSxIjTW1REBHXgj2aw4HW9X6CXb6FzerBgP9XenC7 - restart: always + agenta-web: + build: + context: ./agenta-web + dockerfile: prod.Dockerfile + volumes: + - ./agenta-web/src:/app/src + - ./agenta-web/public:/app/public + ports: + - "3000:3000" + networks: + - agenta-network + labels: + - "traefik.http.routers.agenta-web.rule=Host(`${BARE_DOMAIN_NAME}`) && PathPrefix(`/`)" + - "traefik.http.routers.agenta-web.entrypoints=web" + - "traefik.http.services.agenta-web.loadbalancer.server.port=3000" + environment: + - NEXT_PUBLIC_POSTHOG_API_KEY=phc_hmVSxIjTW1REBHXgj2aw4HW9X6CXb6FzerBgP9XenC7 + restart: always - mongo: - image: mongo:5.0 - environment: - MONGO_INITDB_ROOT_USERNAME: username - MONGO_INITDB_ROOT_PASSWORD: password - volumes: - - mongodb_data:/data/db - ports: - - "27017:27017" - networks: - - agenta-network - healthcheck: - test: [ "CMD", "mongo", "--eval", "db.adminCommand('ping')" ] - interval: 10s - timeout: 10s - retries: 20 - restart: always + redis: + image: redis:latest + networks: + - agenta-network + volumes: + - redis_data:/data + restart: always - mongo_express: - image: mongo-express - environment: - ME_CONFIG_MONGODB_ADMINUSERNAME: username - ME_CONFIG_MONGODB_ADMINPASSWORD: password - ME_CONFIG_MONGODB_SERVER: mongo - ports: - - "8081:8081" - networks: - - agenta-network - depends_on: - mongo: - condition: service_healthy - restart: always + rabbitmq: + image: rabbitmq:3-management + ports: + - "5672:5672" + - "15672:15672" + volumes: + - ./rabbitmq_data:/var/lib/rabbitmq + environment: + RABBITMQ_DEFAULT_USER: "guest" + RABBITMQ_DEFAULT_PASS: "guest" + networks: + - agenta-network - redis: - image: redis:latest - networks: - - agenta-network - volumes: - - redis_data:/data - restart: always + celery_worker: + build: ./agenta-backend + command: > + watchmedo auto-restart --directory=./agenta_backend --pattern=*.py --recursive -- celery -A agenta_backend.main.celery_app worker --concurrency=1 --loglevel=INFO + environment: + - REDIS_URL=redis://redis:6379/0 + - CELERY_BROKER_URL=amqp://guest@rabbitmq// + - CELERY_RESULT_BACKEND=redis://redis:6379/0 + - FEATURE_FLAG=oss + volumes: + - ./agenta-backend/agenta_backend:/app/agenta_backend + - /var/run/docker.sock:/var/run/docker.sock + depends_on: + - rabbitmq + - redis + extra_hosts: + - "host.docker.internal:host-gateway" + networks: + - agenta-network - rabbitmq: - image: rabbitmq:3-management - ports: - - "5672:5672" - - "15672:15672" - volumes: - - ./rabbitmq_data:/var/lib/rabbitmq - environment: - RABBITMQ_DEFAULT_USER: "guest" - RABBITMQ_DEFAULT_PASS: "guest" - networks: - - agenta-network + postgres: + image: postgres:16.2 + container_name: postgres + restart: always + environment: + POSTGRES_USER: username + POSTGRES_PASSWORD: password + POSTGRES_DB: agenta_oss + ports: + - "5432:5432" + networks: + - agenta-network + volumes: + - postgresdb-data:/var/lib/postgresql/data/ + - ./docker-assets/postgres/init-db.sql:/docker-entrypoint-initdb.d/init-db.sql + healthcheck: + test: ["CMD-SHELL", "pg_isready -U postgres"] + interval: 10s + timeout: 5s + retries: 5 + + pgadmin: + image: dpage/pgadmin4 + restart: always + environment: + PGADMIN_DEFAULT_EMAIL: "admin@example.com" + PGADMIN_DEFAULT_PASSWORD: "password" + PGADMIN_SERVER_HOST: "postgres" + PGADMIN_SERVER_PORT: 5432 + PGADMIN_SERVER_USER: "username" + PGADMIN_SERVER_PASSWORD: "password" + PGADMIN_SERVER_DB: agenta_oss + ports: + - "5050:80" + networks: + - agenta-network + volumes: + - pgadmin-data:/var/lib/pgadmin + depends_on: + postgres: + condition: service_healthy - celery_worker: - build: ./agenta-backend - command: > - watchmedo auto-restart --directory=./agenta_backend --pattern=*.py --recursive -- celery -A agenta_backend.main.celery_app worker --concurrency=1 --loglevel=INFO - environment: - - MONGODB_URI=mongodb://username:password@mongo:27017 - - REDIS_URL=redis://redis:6379/0 - - CELERY_BROKER_URL=amqp://guest@rabbitmq// - - CELERY_RESULT_BACKEND=redis://redis:6379/0 - - FEATURE_FLAG=oss - volumes: - - ./agenta-backend/agenta_backend:/app/agenta_backend - - /var/run/docker.sock:/var/run/docker.sock - depends_on: - - mongo - - rabbitmq - - redis - extra_hosts: - - "host.docker.internal:host-gateway" - networks: - - agenta-network networks: - agenta-network: - name: agenta-network + agenta-network: + name: agenta-network volumes: - mongodb_data: - redis_data: + postgresdb-data: + redis_data: diff --git a/docker-compose.test.yml b/docker-compose.test.yml index 07cf5ce4e6..552d6103ef 100644 --- a/docker-compose.test.yml +++ b/docker-compose.test.yml @@ -1,7 +1,7 @@ -version: '3.8' services: reverse-proxy: image: traefik:v2.10 + container_name: agenta-reverse_proxy-test command: --api.dashboard=true --api.insecure=true --providers.docker --entrypoints.web.address=:80 ports: - "80:80" @@ -15,14 +15,13 @@ services: build: ./agenta-backend container_name: agenta-backend-test environment: - - MONGODB_URI=mongodb://username:password@mongo:27017/ + - POSTGRES_URI=postgresql+asyncpg://username:password@postgres:5432 - REDIS_URL=redis://redis:6379/0 - ENVIRONMENT=${ENVIRONMENT} - BARE_DOMAIN_NAME=localhost - DOMAIN_NAME=http://localhost - CELERY_BROKER_URL=amqp://guest@rabbitmq// - CELERY_RESULT_BACKEND=redis://redis:6379/0 - - DATABASE_MODE=test - FEATURE_FLAG=oss - OPENAI_API_KEY=${OPENAI_API_KEY} - AGENTA_TEMPLATE_REPO=agentaai/templates_v2 @@ -46,7 +45,7 @@ services: "--log-level", "info", "--root-path", - "/api" + "/api", ] labels: - "traefik.http.routers.backend.rule=PathPrefix(`/api/`)" @@ -57,7 +56,7 @@ services: - "traefik.http.services.backend.loadbalancer.server.port=8000" - "traefik.http.routers.backend.service=backend" depends_on: - mongo: + postgres: condition: service_healthy extra_hosts: - host.docker.internal:host-gateway @@ -84,37 +83,6 @@ services: - NEXT_PUBLIC_POSTHOG_API_KEY=phc_hmVSxIjTW1REBHXgj2aw4HW9X6CXb6FzerBgP9XenC7 restart: always - mongo_express: - image: mongo-express:0.54.0 - environment: - ME_CONFIG_MONGODB_ADMINUSERNAME: username - ME_CONFIG_MONGODB_ADMINPASSWORD: password - ME_CONFIG_MONGODB_SERVER: mongo - ports: - - "8081:8081" - networks: - - agenta-network - depends_on: - mongo: - condition: service_healthy - restart: always - - mongo: - image: mongo:5.0 - container_name: agenta-mongo-test - environment: - MONGO_INITDB_ROOT_USERNAME: username - MONGO_INITDB_ROOT_PASSWORD: password - ports: - - "27017:27017" - healthcheck: - test: [ "CMD", "mongo", "--eval", "db.adminCommand('ping')" ] - interval: 10s - timeout: 10s - retries: 20 - networks: - - agenta-network - redis: image: redis:latest container_name: agenta-redis-test @@ -143,29 +111,47 @@ services: command: > watchmedo auto-restart --directory=./agenta_backend --pattern=*.py --recursive -- celery -A agenta_backend.main.celery_app worker --concurrency=1 --loglevel=INFO environment: - - MONGODB_URI=mongodb://username:password@mongo:27017 + - POSTGRES_URI=postgresql+asyncpg://username:password@postgres:5432 - REDIS_URL=redis://redis:6379/0 - ENVIRONMENT=${ENVIRONMENT} - CELERY_BROKER_URL=amqp://guest@rabbitmq// - CELERY_RESULT_BACKEND=redis://redis:6379/0 - FEATURE_FLAG=oss - - DATABASE_MODE=test volumes: - ./agenta-backend/agenta_backend:/app/agenta_backend - /var/run/docker.sock:/var/run/docker.sock depends_on: - rabbitmq - redis - - mongo + - postgres extra_hosts: - host.docker.internal:host-gateway networks: - agenta-network + postgres: + image: postgres:16.2 + container_name: agenta-postgresdb-test + restart: always + environment: + POSTGRES_USER: username + POSTGRES_PASSWORD: password + ports: + - "5432:5432" + networks: + - agenta-network + volumes: + - postgresdb-data:/var/lib/postgresql/data/ + healthcheck: + test: ["CMD-SHELL", "pg_isready -U postgres"] + interval: 10s + timeout: 5s + retries: 5 + networks: agenta-network: name: agenta-network volumes: - mongodb_data: + postgresdb-data: redis_data: diff --git a/docker-compose.yml b/docker-compose.yml index 2832585b27..5569d90cd0 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -14,7 +14,7 @@ services: backend: build: ./agenta-backend environment: - - MONGODB_URI=mongodb://username:password@mongo:27017 + - POSTGRES_URI=postgresql+asyncpg://username:password@postgres:5432/agenta_oss - REDIS_URL=redis://redis:6379/0 - ENVIRONMENT=development - DATABASE_MODE=v2 @@ -60,7 +60,7 @@ services: "/api", ] depends_on: - mongo: + postgres: condition: service_healthy restart: always @@ -84,39 +84,6 @@ services: - NEXT_PUBLIC_POSTHOG_API_KEY=phc_hmVSxIjTW1REBHXgj2aw4HW9X6CXb6FzerBgP9XenC7 restart: always - mongo: - image: mongo:5.0 - environment: - MONGO_INITDB_ROOT_USERNAME: username - MONGO_INITDB_ROOT_PASSWORD: password - volumes: - - mongodb_data:/data/db - ports: - - "27017:27017" - networks: - - agenta-network - healthcheck: - test: ["CMD", "mongo", "--eval", "db.adminCommand('ping')"] - interval: 10s - timeout: 10s - retries: 20 - restart: always - - mongo_express: - image: mongo-express:0.54.0 - environment: - ME_CONFIG_MONGODB_ADMINUSERNAME: username - ME_CONFIG_MONGODB_ADMINPASSWORD: password - ME_CONFIG_MONGODB_SERVER: mongo - ports: - - "8081:8081" - networks: - - agenta-network - depends_on: - mongo: - condition: service_healthy - restart: always - redis: image: redis:latest networks: @@ -143,7 +110,7 @@ services: command: > watchmedo auto-restart --directory=./agenta_backend --pattern=*.py --recursive -- celery -A agenta_backend.main.celery_app worker --concurrency=1 --loglevel=INFO environment: - - MONGODB_URI=mongodb://username:password@mongo:27017 + - POSTGRES_URI=postgresql+asyncpg://username:password@postgres:5432/agenta_oss - REDIS_URL=redis://redis:6379/0 - CELERY_BROKER_URL=amqp://guest@rabbitmq// - CELERY_RESULT_BACKEND=redis://redis:6379/0 @@ -152,7 +119,7 @@ services: - ./agenta-backend/agenta_backend:/app/agenta_backend - /var/run/docker.sock:/var/run/docker.sock depends_on: - - mongo + - postgres - rabbitmq - redis extra_hosts: @@ -160,11 +127,54 @@ services: networks: - agenta-network + postgres: + image: postgres:16.2 + container_name: postgres + restart: always + environment: + POSTGRES_USER: username + POSTGRES_PASSWORD: password + POSTGRES_DB: agenta_oss + ports: + - "5432:5432" + networks: + - agenta-network + volumes: + - postgresdb-data:/var/lib/postgresql/data/ + - ./docker-assets/postgres/init-db.sql:/docker-entrypoint-initdb.d/init-db.sql + healthcheck: + test: ["CMD-SHELL", "pg_isready -U postgres"] + interval: 10s + timeout: 5s + retries: 5 + + pgadmin: + image: dpage/pgadmin4 + restart: always + environment: + PGADMIN_DEFAULT_EMAIL: "admin@example.com" + PGADMIN_DEFAULT_PASSWORD: "password" + PGADMIN_SERVER_HOST: "postgres" + PGADMIN_SERVER_PORT: 5432 + PGADMIN_SERVER_USER: "username" + PGADMIN_SERVER_PASSWORD: "password" + PGADMIN_SERVER_DB: agenta_oss + ports: + - "5050:80" + networks: + - agenta-network + volumes: + - pgadmin-data:/var/lib/pgadmin + depends_on: + postgres: + condition: service_healthy + networks: agenta-network: name: agenta-network volumes: - mongodb_data: redis_data: - nextjs_cache: \ No newline at end of file + nextjs_cache: + postgresdb-data: + pgadmin-data: diff --git a/docs/guides/evaluation_from_sdk.mdx b/docs/guides/evaluation_from_sdk.mdx new file mode 100644 index 0000000000..3a2e3620bf --- /dev/null +++ b/docs/guides/evaluation_from_sdk.mdx @@ -0,0 +1,107 @@ +--- +title: "Running Evaluations with SDK" +--- + + + This guide is also available as a [Jupyter + Notebook](https://github.com/Agenta-AI/agenta/blob/main/cookbook/evaluations_with_sdk.ipynb). + + +## Introduction + +In this guide, we'll demonstrate how to interact programmatically with evaluations in the Agenta platform using the SDK (or the raw API). This will include: + +- Creating a test set +- Configuring an evaluator +- Running an evaluation +- Retrieving the results of evaluations + +This assumes that you have already created an LLM application and variants in Agenta. + +## Architectural Overview + +Evaluations are executed on the Agenta backend. Specifically, Agenta invokes the LLM application for each row in the test set and processes the output using the designated evaluator. Operations are managed through Celery tasks. The interactions with the LLM application are asynchronous, batched, and include retry mechanisms. The batching configuration can be adjusted to avoid exceeding rate limits imposed by the LLM provider. + +## Setup + +### Installation + +Ensure that the Agenta SDK is installed and up-to-date in your development environment: + +```bash +pip install -U agenta +``` + +### Configuration + +After setting up your environment, you need to configure the SDK: + +```python +from agenta.client.backend.client import AgentaApi + +# Set up your application ID and API key +app_id = "your_app_id" +api_key = "your_api_key" +host = "https://cloud.agenta.ai" + +# Initialize the client +client = AgentaApi(base_url=host + "/api", api_key=api_key) +``` + +## Create a Test Set + +You can create and update test sets as follows: + +```python +from agenta.client.backend.types.new_testset import NewTestset + +# Example data for the test set +csvdata = [ + {"country": "France", "capital": "Paris"}, + {"country": "Germany", "capital": "Berlin"} +] + +# Create a new test set +response = client.testsets.create_testset(app_id=app_id, request=NewTestset(name="Test Set", csvdata=csvdata)) +test_set_id = response.id +``` + +## Create Evaluators + +Set up evaluators that will assess the performance based on specific criteria: + +```python +# Create an exact match evaluator +response = client.evaluators.create_new_evaluator_config( + app_id=app_id, name="Capital Evaluator", evaluator_key="auto_exact_match", + settings_values={"correct_answer_key": "capital"} +) +exact_match_eval_id = response.id +``` + +## Run an Evaluation + +Execute an evaluation using the previously defined test set and evaluators: + +```python +from agenta.client.backend.types.llm_run_rate_limit import LlmRunRateLimit + +response = client.evaluations.create_evaluation( + app_id=app_id, variant_ids=["your_variant_id"], testset_id=test_set_id, + evaluators_configs=[exact_match_eval_id], + rate_limit=LlmRunRateLimit(batch_size=10, max_retries=3, retry_delay=2, delay_between_batches=5) +) +``` + +## Retrieve Results + +After running the evaluation, fetch the results to see how well the model performed against the test set: + +```python +results = client.evaluations.fetch_evaluation_results("your_evaluation_id") +print(results) +``` + +## Conclusion + +This guide covers the basic steps for using the SDK to manage evaluations within Agenta. diff --git a/docs/mint.json b/docs/mint.json index 1ae772ad3d..45433047a6 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -105,9 +105,10 @@ }, { "group": "Self-host agenta", + "pages": [ - "self-host/host-locally", - "self-host/migration", + "self-host/host-locally", + { "group": "Deploy Remotely", "pages": [ @@ -116,7 +117,14 @@ "self-host/host-on-gcp", "self-host/host-on-kubernetes" ] - } + }, + {"group": "Migration", + "pages":[ + "self-host/migration/migration-to-postgres", + "self-host/migration/migration-to-mongodb" + + ]} + ] }, { @@ -343,7 +351,14 @@ "guides/tutorials/deploy-mistral-model", "guides/extract_job_information" ] + }, + { + "group": "Cookbooks", + "pages": [ + "guides/evaluation_from_sdk" + ] } + ], "api": { "baseUrl": "http://localhost/api" diff --git a/docs/self-host/migration.mdx b/docs/self-host/migration.mdx deleted file mode 100644 index 75d272cfd9..0000000000 --- a/docs/self-host/migration.mdx +++ /dev/null @@ -1,64 +0,0 @@ ---- -title: Migration -description: 'This is a step-by-step guide for upgrading to the latest version of Agenta' ---- - -## Upgrading to the Latest Version - -To upgrade to the latest version of Agenta, execute the following command: - -``` -docker compose -f docker-compose.gh.yml up -d --pull always -``` - -This command instructs Docker to fetch and use the latest version of the Agenta image. - -# Database Migrations - -The steps below outlines the process for performing database migrations using Beanie with the Agenta backend system. - -Beanie is a MongoDB ODM (Object Document Mapper) for Python. More information about Beanie can be found [here](https://github.com/roman-right/beanie). - -## Steps for Migration - -### Accessing the Backend Docker Container - -To access the backend Docker container: - -1. **List Docker Containers**: List all running Docker containers with the command: - - ```bash - docker ps - ``` - -2. **Identify the `agenta-backend` Container ID**: Note down the container ID from the output. Example output: - - ``` - CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES - ae0c56933636 agenta-backend "uvicorn agenta_back…" 3 hours ago Up 3 hours 8000/tcp agenta-backend-1 - e35f6c8b7fcb agenta-agenta-web "docker-entrypoint.s…" 3 hours ago Up 3 hours 0.0.0.0:3000->3000/tcp agenta-agenta-web-1 - ``` - -3. **SSH into the Container**: Use the following command, replacing `CONTAINER_ID` with your container's ID: - - ```bash - docker exec -it CONTAINER_ID bash - ``` - -### Performing the Migration - -To perform the database migration: - -1. **Navigate to the Migration Directory**: Change the directory to the migration folder: - - ```sh - cd agenta_backend/migrations/{migration_name} - ``` - Replace `{migration_name}` with the actual migration name, e.g., `v0_10_0_to_v0_11_0`. - -2. **Run Beanie Migration**: Execute the migration command: - - ```sh - beanie migrate --no-use-transaction -uri 'mongodb://username:password@mongo' -db 'agenta_v2' -p . - ``` - Ensure to replace `username`, `password`, and other placeholders with actual values. \ No newline at end of file diff --git a/docs/self-host/migration/migration-to-mongodb.mdx b/docs/self-host/migration/migration-to-mongodb.mdx new file mode 100644 index 0000000000..ad3c271611 --- /dev/null +++ b/docs/self-host/migration/migration-to-mongodb.mdx @@ -0,0 +1,75 @@ +--- +title: Migration in MongoDB (Deprecated) +description: "This is a step-by-step guide for upgrading to the latest version of Agenta" +--- + + + This guide is depracated as it relates to the migration for agenta versions up + to v0.18 which have used MongoDB as a database. Starting with v0.19 we now use + Postgres as a database. If you are using an old version of agenta, use this + guide to migrate all your data to v0.18 then follow the guide for migrating to + postgres [here](/self-host/migration/migration-to-postgres).{" "} + +. + +## Upgrading to the Latest Version + +To upgrade to the latest version of Agenta, execute the following command: + +``` +docker compose -f docker-compose.gh.yml up -d --pull always +``` + +This command instructs Docker to fetch and use the latest version of the Agenta image. + +# Database Migrations + +The steps below outlines the process for performing database migrations using Beanie with the Agenta backend system. + +Beanie is a MongoDB ODM (Object Document Mapper) for Python. More information about Beanie can be found [here](https://github.com/roman-right/beanie). + +## Steps for Migration + +### Accessing the Backend Docker Container + +To access the backend Docker container: + +1. **List Docker Containers**: List all running Docker containers with the command: + + ```bash + docker ps + ``` + +2. **Identify the `agenta-backend` Container ID**: Note down the container ID from the output. Example output: + + ``` + CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES + ae0c56933636 agenta-backend "uvicorn agenta_back…" 3 hours ago Up 3 hours 8000/tcp agenta-backend-1 + e35f6c8b7fcb agenta-agenta-web "docker-entrypoint.s…" 3 hours ago Up 3 hours 0.0.0.0:3000->3000/tcp agenta-agenta-web-1 + ``` + +3. **SSH into the Container**: Use the following command, replacing `CONTAINER_ID` with your container's ID: + + ```bash + docker exec -it CONTAINER_ID bash + ``` + +### Performing the Migration + +To perform the database migration: + +1. **Navigate to the Migration Directory**: Change the directory to the migration folder: + + ```sh + cd agenta_backend/migrations/{migration_name} + ``` + + Replace `{migration_name}` with the actual migration name, e.g., `v0_10_0_to_v0_11_0`. + +2. **Run Beanie Migration**: Execute the migration command: + + ```sh + beanie migrate --no-use-transaction -uri 'mongodb://username:password@mongo' -db 'agenta_v2' -p . + ``` + + Ensure to replace `username`, `password`, and other placeholders with actual values. diff --git a/docs/self-host/migration/migration-to-postgres.mdx b/docs/self-host/migration/migration-to-postgres.mdx new file mode 100644 index 0000000000..db3eaab3c4 --- /dev/null +++ b/docs/self-host/migration/migration-to-postgres.mdx @@ -0,0 +1,50 @@ +--- +title: Migration to PostgreSQL +description: "Guide for migrating data from MongoDB (`agenta =v0.19`)." +--- + + + As of version `0.19.0`, Agenta is transitioning from MongoDB to PostgreSQL. + Users need to migrate their MongoDB databases to this latest version, as this + will be the only version receiving feature updates and patches. + + +This guide provides step-by-step instructions for migrating your agenta instance from MongoDB to PostgreSQL version. + +### Prepare for Migration + +Before starting the migration, ensure that you have backed up your production data. + +While the migration will not modify any data in your MongoDB instance, it is highly recommended that you create a [backup](https://www.mongodb.com/docs/manual/tutorial/backup-and-restore-tools/) of your database in the MongoDB instance before running the migration script. This ensures you have a recovery point in case of any issues. + +### Start the Migration + +1. Start the local instance of Agenta using the dedicated migration Docker Compose file, and ensure that both the MongoDB and PostgreSQL instances are active. +``` +cd agenta-backend/agenta_backend/migrations/mongo_to_postgres +docker compose -f docker-compose.migration.yml up +``` +2. Use the following commands to initiate the migration: + +```bash +docker ps +``` + +The above command will list the running docker containers that you have. Copy the backend container id and execute bash. + +```bash +docker exec -it {backend-container-id} bash +``` + +Next, navigate to the `mongo_to_postgres` folder to execute the migration script. + +```bash +cd /app/agenta_backend/migrations/mongo_to_postgres +python3 migration.py +``` + +### Post Migration + +After completing the migration, ensure you check the data integrity in PostgreSQL by accessing Agenta on the web and verifying that your data is intact and everything works fine. + +In the event that you encounter issues and need to revert the migration, rest assured that your data in the MongoDB instance is still intact. All you need to do to revert is to check out the last commit you were on before the PostgreSQL migration and create a Github [issue](https://github.com/Agenta-AI/agenta/issues/new?assignees=&labels=postgres,bug,Backend&projects=&template=bug_report.md&title=[Bug]+) describing the problem you encountered.