Latency cost in eval #1468

aakrem · 2024-03-28T13:40:55Z

No description provided.

mmabrouk

Thanks for the PR @aakrem, great work ,please see my minor comments.

I think the code changes are minimal, and we can revert without any issue if we decide to modify where to track cost/latency from the output of the app to the observability logic. So we should move forward with this without any issue.

Can you please add also the same logic to the comparison view? Thanks!

agenta-backend/agenta_backend/services/llm_apps_service.py

agenta-backend/agenta_backend/tasks/evaluations.py

…genta into latency-cost-in-eval

mmabrouk

@bekossy @aakrem FE tests are still failing. It seems to be a bug in the code:

  1) Evaluation Comparison Test
       Executing Evaluation Comparison Workflow
         Should verify that there are completed evaluations in the list:
     TypeError: The following error originated from your application code, not from Cypress.

  > Cannot read properties of null (reading 'getColId')

@aakrem I think we should show the total (sum) cost in the main view and not the average. I think it's more interesting, sine it allows the user to understand the cost of their experiments. For the latency, I think it makes sense to show the average, however, I would label it (avg latency).

aakrem · 2024-03-31T13:04:23Z

@mmabrouk btw we don't display "average/avg" for the all evaluators average results. displaying average will maybe make some confusion about what't the type of number is for the rest of the columns

aakrem and others added 20 commits March 28, 2024 09:26

add average cost and latency to evaluation schema

513ecdc

add cost and latency to all models and related methods

f9e8269

add aggregate method for the llm response latency & cost

1082d8b

adjust result from llm response to contain cost and latency

484f0f5

add average cost and latency to the evaluation

20609b9

cost and latency columns

43eb333

fixes

55d7c0a

improve the getTypedValue with new types

6b5eb60

formatters utils | evaluators link fixed

31882df

use Maaz currency and latency helpers

87ed91f

add latency and cost to models

3147577

fix aggregation

9847da5

adjust schema

c382959

adjust EvaluationScenarioOutputDB

f236aed

fixes

da220cb

format

361b6d7

handle null values for cost and latency

5042b9d

revert change

7c28565

fix types

ccda053

add cost and latency in eval scenario

5b606e4

mmabrouk requested changes Mar 29, 2024

View reviewed changes

agenta-backend/agenta_backend/services/llm_apps_service.py Outdated Show resolved Hide resolved

agenta-backend/agenta_backend/tasks/evaluations.py Outdated Show resolved Hide resolved

agenta-backend/agenta_backend/tasks/evaluations.py Show resolved Hide resolved

aakrem and others added 9 commits March 29, 2024 11:25

remove old implementation code

1a94867

fixed failing cypress tests

52fb907

Merge branch 'latency-cost-in-eval' of https://github.com/Agenta-AI/a…

9eec347

…genta into latency-cost-in-eval

handle optional latency and cost in app response

29e0989

add latency and cost to comparison view

f65424a

add latency and cost to comparison view

21b42bb

format

d9661af

another fix

cf87d94

fix types

82ecf54

fix formatter

9478392

mmabrouk requested changes Mar 31, 2024

View reviewed changes

bumped ag-grid version

e49dfb1

aakrem merged commit 6cd4a0a into main Mar 31, 2024
8 checks passed

aakrem deleted the latency-cost-in-eval branch March 31, 2024 14:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latency cost in eval #1468

Latency cost in eval #1468

aakrem commented Mar 28, 2024

mmabrouk left a comment

mmabrouk left a comment •

edited

Loading

aakrem commented Mar 31, 2024 •

edited

Loading

Latency cost in eval #1468

Latency cost in eval #1468

Conversation

aakrem commented Mar 28, 2024

mmabrouk left a comment

Choose a reason for hiding this comment

mmabrouk left a comment • edited Loading

Choose a reason for hiding this comment

aakrem commented Mar 31, 2024 • edited Loading

mmabrouk left a comment •

edited

Loading

aakrem commented Mar 31, 2024 •

edited

Loading