Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve eval #297

Merged
merged 4 commits into from
Oct 14, 2024
Merged

Improve eval #297

merged 4 commits into from
Oct 14, 2024

Conversation

jlewi
Copy link
Owner

@jlewi jlewi commented Oct 14, 2024

Fix a number of issues with eval.

  1. Measure the time generation takes

    • Latency has a huge impact on UX
    • Latency can be affected by a number of things (e.g. amount of context) so we want to be able to measure that so we can see the impact
  2. Flush the logs (Fix Evaluator final error messages aren't sent to GCP - problem flushing logs? #295) in app.Shutdown because the experiment seems to be failing to push final errors to GCP

  3. Don't terminate if we fail to wait for a blocklog;

    • Keep track of this inside the EvalResult proto
    • Aborting on waitForBlockLog is just causing us to need to run multiple times in order to successfully complete an experiment
  4. Improve logging about progress of experiment

    • Log example index and number of examples whenever we process an example

Flush logs in the evaluator.
Log the cellId in the event we timeout waiting for the blocklog during evaluation.
Copy link

netlify bot commented Oct 14, 2024

Deploy Preview for foyle canceled.

Name Link
🔨 Latest commit 056cde1
🔍 Latest deploy log https://app.netlify.com/sites/foyle/deploys/670d68f3b5ed5700082921d2

@jlewi jlewi merged commit 6e0df99 into main Oct 14, 2024
5 checks passed
@jlewi jlewi deleted the jlewi/improveeval branch October 14, 2024 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evaluator final error messages aren't sent to GCP - problem flushing logs?
1 participant