-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation in backend #1036
Evaluation in backend #1036
Changes from all commits
b4b64fe
e77afaa
7796082
ef277bc
873fc44
543c3a0
0da9bd3
78813fe
d50be89
4e7145a
6b9ab8a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,3 +56,4 @@ agenta-web/cypress/screenshots/ | |
agenta-web/cypress/videos/ | ||
.nextjs_cache/ | ||
|
||
rabbitmq_data/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
import os | ||
from kombu import Exchange, Queue | ||
|
||
# Use environment variables with default values as fallback | ||
BROKER_URL = os.getenv('CELERY_BROKER_URL') | ||
CELERY_RESULT_BACKEND = os.getenv('CELERY_RESULT_BACKEND') | ||
CELERY_TASK_SERIALIZER = 'json' | ||
CELERY_ACCEPT_CONTENT = ['json'] | ||
CELERY_RESULT_SERIALIZER = 'json' | ||
CELERY_TIMEZONE = 'UTC' | ||
|
||
# TODO: Can we improve this to be more dynamic? | ||
CELERY_QUEUES = ( | ||
Queue('agenta_backend.tasks.evaluations.auto_exact_match', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see the advantage of having multiple tasks one per each eval. See comment in GoogleDoc. I think it would make us rerun the variant many times (+ no advantage) |
||
Exchange('agenta_backend.tasks.evaluations.auto_exact_match'), | ||
routing_key='agenta_backend.tasks.evaluations.auto_exact_match'), | ||
Queue('agenta_backend.tasks.evaluations.auto_similarity_match', | ||
Exchange('agenta_backend.tasks.evaluations.auto_similarity_match'), | ||
routing_key='agenta_backend.tasks.evaluations.auto_similarity_match'), | ||
Queue('agenta_backend.tasks.evaluations.auto_regex_test', | ||
Exchange('agenta_backend.tasks.evaluations.auto_regex_test'), | ||
routing_key='agenta_backend.tasks.evaluations.auto_regex_test'), | ||
) |
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -212,6 +212,23 @@ class EvaluationScenarioOutput(EmbeddedModel): | |||||||||||||
variant_id: str | ||||||||||||||
variant_output: str | ||||||||||||||
|
||||||||||||||
# TODO: This should be removed and replaced with EvaluationDB | ||||||||||||||
# Keppeing it for now for backwards compatibility | ||||||||||||||
class BulkEvaluationDB(Model): | ||||||||||||||
app: AppDB = Reference(key_name="app") | ||||||||||||||
organization: OrganizationDB = Reference(key_name="organization") | ||||||||||||||
user: UserDB = Reference(key_name="user") | ||||||||||||||
status: str | ||||||||||||||
evaluation_type: List[str] | ||||||||||||||
evaluation_type_settings: EvaluationTypeSettings | ||||||||||||||
variants: List[ObjectId] | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are we saving the evaluation of multiple variants in the same object? We want to enable the user to run the eval for multiple variants from the same command. However there is no need to save these results in the same evaluation document. (disregard this comment in case the goal was just to use for human a/b testing) |
||||||||||||||
testset: TestSetDB = Reference(key_name="testsets") | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since you're setting the key_name of the testset field to testsets, does this mean that the field is going to be storing multiple testsets? If yes, then I think it'll be advisable to refactor the field to this:
Suggested change
Or, this:
Suggested change
Otherwise, it is fine to call the Reference object with no default value:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think one eval should link to one test set There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we might allow running an eval in one batch on multiple testsets, but no reason to save all of these in one documetn |
||||||||||||||
created_at: Optional[datetime] = Field(default=datetime.utcnow()) | ||||||||||||||
updated_at: Optional[datetime] = Field(default=datetime.utcnow()) | ||||||||||||||
|
||||||||||||||
class Config: | ||||||||||||||
collection = "bulk_evaluations" | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
class EvaluationDB(Model): | ||||||||||||||
app: AppDB = Reference(key_name="app") | ||||||||||||||
|
@@ -246,6 +263,25 @@ class EvaluationScenarioDB(Model): | |||||||||||||
class Config: | ||||||||||||||
collection = "evaluation_scenarios" | ||||||||||||||
|
||||||||||||||
# TODO: This should be removed and replaced with EvaluationScenarioDB | ||||||||||||||
# Keppeing it for now for backwards compatibility | ||||||||||||||
class EvaluationScenarioDBForBulkEvaluationDB(Model): | ||||||||||||||
user: UserDB = Reference(key_name="user") | ||||||||||||||
organization: OrganizationDB = Reference(key_name="organization") | ||||||||||||||
evaluation: BulkEvaluationDB = Reference(key_name="bulk_evaluations") | ||||||||||||||
inputs: List[EvaluationScenarioInput] | ||||||||||||||
outputs: List[EvaluationScenarioOutput] | ||||||||||||||
vote: Optional[str] | ||||||||||||||
score: Optional[Union[str, int]] | ||||||||||||||
correct_answer: Optional[str] | ||||||||||||||
created_at: Optional[datetime] = Field(default=datetime.utcnow()) | ||||||||||||||
updated_at: Optional[datetime] = Field(default=datetime.utcnow()) | ||||||||||||||
is_pinned: Optional[bool] | ||||||||||||||
note: Optional[str] | ||||||||||||||
|
||||||||||||||
class Config: | ||||||||||||||
collection = "single_evaluation_scenarios" | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
class CustomEvaluationDB(Model): | ||||||||||||||
evaluation_name: str | ||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can. To automatically discover tasks, you can use the
autodiscover_tasks
function from celery and specify the path to the modules containing the tasks. This function will automatically discover tasks by inspecting the installed apps and modules, and you won't have to manually define the queues and exchanges for each task.