Evaluations in backend #1137

mmabrouk · 2024-01-02T16:31:12Z

No description provided.

[Enhancement]: Integration Tests for Evaluation

…eader

…/agenta into evaluations-in-backend

mmabrouk

@MohammedMaaz @aakrem @aybruhm

Comparison view:

Currently, each input/row is displayed twice. We need to map the evaluation scenarios from the multiple runs. This should ideally be done in the backend. Here is a simple Python algorithm for how this could be implemented:

from collections import defaultdict

list1 = [('a', 1), ('b', 2)]
list2 = [('a', 10), ('b', 2)]
list3 = [('a', 3), ('b', 2), ('c', 5)]

merged_dict = defaultdict(list)
for lst in [list1, list2, list3]:
    for k, v in lst:
        merged_dict[k].append(v)

Input columns are displayed twice. They should be shown only once.
The order of the columns should be: Input/Expected output/Output var1/Output var2/Results var1/Results var2.

Evaluation view:

Remove the id column.
There's no option to shut down failed evaluations. We need a method to stop them.
There's no way to determine if an evaluation has failed or if an evaluation scenario has failed.
There's no option to rerun an evaluation.

Evaluators view:

Exact match should be available from the start. This should probably be handled in the backend. This also applies to all evaluators that don't need configuration.

…/agenta into evaluations-in-backend

…migration

Enhancement: Migration issues fix

…ackend

Main to evaluations in backend

MohammedMaaz and others added 25 commits January 2, 2024 16:25

cleanup evaluate.ts from old functions

a334553

more human evals resources

8e3b50f

more resources for human evaluation

cdb1c50

more human evals resources

d836506

format

7b55fb0

Cleanup - added extra-hosts to celery_worker compose service

f04a995

remove annotations

76b2326

Merge branch 'evaluations-in-backend' into evaluations-tests

98ca02a

Update - modified test_create_evaluation

07f0587

Cleanup - added extra-hosts to celery_worker compose service

809bfd2

more fixes

107548a

Merge branch 'evaluations-in-backend' into evaluations-tests

f87d21b

🎨 Format - ran black

26294fd

fixed missing pages in annotations

bd636ee

Merge pull request #1122 from Agenta-AI/evaluations-tests

a82e04c

[Enhancement]: Integration Tests for Evaluation

fetch human evaluations scenarios

6136099

fix results and update evaluation

92e5c14

UI improvements: empty component | duration counter | compare table h…

7422743

…eader

Merge branch 'evaluations-in-backend' of https://github.com/Agenta-AI…

022976b

…/agenta into evaluations-in-backend

fix delete human eval

e4b0eab

Merge branch 'evaluations-in-backend' of https://github.com/Agenta-AI…

95ef4ef

…/agenta into evaluations-in-backend

fix 500 on single model result

42f8492

Merge branch 'evaluations-in-backend' of https://github.com/Agenta-AI…

626b1cc

…/agenta into evaluations-in-backend

fix results for single model

3ff30b2

Merge branch 'evaluations-in-backend' of https://github.com/Agenta-AI…

69b813a

…/agenta into evaluations-in-backend

mmabrouk commented Jan 2, 2024

View reviewed changes

aakrem and others added 4 commits January 2, 2024 21:54

add a direct-use attribute for evaluators

1c6ced4

Merge branch 'evaluations-in-backend' of https://github.com/Agenta-AI…

fd8b11e

…/agenta into evaluations-in-backend

add logic to failed evals

7a32360

ui fixes | imporvements | refactoring

c76cc3d

mmabrouk and others added 28 commits January 16, 2024 18:18

Merge branch 'evaluations-in-backend' of https://github.com/agenta-ai…

b555eca

…/agenta into evaluations-in-backend

Update - modified logic to allow a/b evaluation to be separate after …

5565981

…migration

Update - fix merge conflicts

26f7442

Update - modified logic to allow a/b evaluation to be separate after …

a61d70f

…migration

Merge branch 'evaluations-in-backend' into migration-issues-fix

52abb7e

bug fix in deletion of environments

4e5e85f

format

a227847

Refactor - simplified logic to evaluation migration

eceec6e

Refactor - simplify logic for evaluation scenarios

0c850cc

Merge pull request #1214 from Agenta-AI/migration-issues-fix

757f668

Enhancement: Migration issues fix

Update - put a 2 seconds sleept

14fe290

Merge branch 'migration-issues-fix' into evaluations-in-backend

08676a0

Update - update results aggregation logic

cf41da4

fix aggregations

9405860

Update - modified evaluations revamp migration

e3318b4

Update - added backward compatibility for old templates

ff281fb

Update - modify code to create custom code evaluator config

1dc2082

Update - fix update app variant db manager

3a223b3

add migration for exact match evaluator

cd55f03

improve evaluator name

6e6ab60

remove verbose print

52de18e

Merge branch 'Agenta-AI:evaluations-in-backend' into evaluations-in-b…

2829131

…ackend

ignore evals with deleted apps

d206a4c

move human evals to separated migrations

887d5e9

Merge branch 'main' into main-to-evaluations-in-backend

29fa37e

Merge pull request #1217 from Agenta-AI/main-to-evaluations-in-backend

1415ab6

Main to evaluations in backend

format

12d06e8

format backend

027d477

aakrem approved these changes Jan 17, 2024

View reviewed changes

aakrem merged commit 0ba7342 into main Jan 17, 2024
5 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluations in backend #1137

Evaluations in backend #1137

mmabrouk commented Jan 2, 2024

mmabrouk left a comment •

edited

Loading

Evaluations in backend #1137

Evaluations in backend #1137

Conversation

mmabrouk commented Jan 2, 2024

mmabrouk left a comment • edited Loading

Choose a reason for hiding this comment

Comparison view:

Evaluation view:

Evaluators view:

mmabrouk left a comment •

edited

Loading