Prompt should support generating markup cells #285

jlewi · 2024-10-08T02:34:00Z

Suggest Markup Cells

Now that the frontend can render markup as ghost cells we want the agent to start generating them
This will allow the AI to
1. Reason about the outputs of commands - e.g. interpret whether the output supports or refutes a hypothesis
2. Suggest markup cells containing plans
Related to Let the AI Suggest Markup Cells #284
Don't restrict the response to a single code block.
- Now that we can render markup cells as ghost cells
- We should allow multi-block responses that can include markup cells
Remove the hack which only generated completions if the current cell was a markup cell. We should generate completions even if the cell is code or output cell
- This was a cost saving measure. However, switching to gpt4o-mini should have sufficiently reduced costs that we can afford to generate completions on all cells.

Change PostProcessing of responses

We no longer limit the response to a single code block.
We allow at most 2 blocks; one markup and one code cell
We do this because from a UX experience generating multiple cells is confusing
If there are multiple markup blocks in sequence we merge them into one block
- This is less confusing for users
- I believe the multiple cells is an artifact of Runme parses markup into blocks.
Drop any cells after the first code block in the response

Proto Changes

Add an enum proto to be used on the front end to report the trigger for the completion. This will help us troubleshoot and detect issues in the frontend logic for triggering.

* Related to #284

netlify · 2024-10-08T02:34:18Z

✅ Deploy Preview for foyle canceled.

Name	Link
🔨 Latest commit	`bdfe06e`
🔍 Latest deploy log	https://app.netlify.com/sites/foyle/deploys/67194ce792f78400086507d2

* Force interactive to false on the returned cells as a temporary work around for #286

…se the output of previous cells can cause us to exceed the limits.

Flush logs in the evaluator. Log the cellId in the event we timeout waiting for the blocklog during evaluation.

…cklog. We should just keep going.

#296 * Compute this at runtime not just eval * Update the log analyzer to store assertions in the traces.

# Experiment Report After running an evaluation experiment, we compute a report that contains the key metrics we want to track. To start with this is * Number of cell match results * Number of errors and examples * Generate latency measured as percentiles * Level1 assertion stats # Level 1 Assertion stats * Add a level 1 assertion to test whether the document is zero. * I believe I observed this started happening when we included a fix to outputs not being included (#286) in #285. * I think the problem is the cell outputs could be very long and this could end up eating all the available context buffer # Reintegrate Level 1 Assertions Into Evaluation * Fix #261 * We start computing level 1 assertions at RunTime so that they are available in production and evaluation * Level1 assertions are computed and then logged * Our Analyzer pipeline reads the assertions from the logs and adds them to the trace * Our evaluation report accumulates assertion statistics and reports them

* During evaluation we are seeing ocassional timeouts on the server due to the HTTP read/write timeout occuring. It looks like this might happen because ChatGPT takes a really long time to respond occassionally. * Update StreamGenerate and GenerateCells to return DeadlineExceeded to indicate a server timeout * Implement a unaryinterceptor to automatically retry requests based on the status code. * The retry is pretty simplistic; its a fixed backoff.

* In 14 of our 424 examples in evaluation the input document sent to the model ends up being the empty string * This is the result of how our doc tailer works. Our doc tailer imposes a length cap on the tail of the document. There was a bug in the tailer where if the last cell in the document exceeded the cap (currently 1110 characters) then an empty string would be returned. * This PR fixes that. If the last cell exceeds the length then we take the tail of that cell. * This PR also checks in completer that the tail of the document is non empty; if its not empty then we fail the completion rather than continuing to generate the completion. * Fix #305

…ent.

* Fix the cicd/job.yaml so that it can correctly release our containers.

standard-input

No issues flagged.
Standard Input can make mistakes. Check important info.

Start generating markup cells

a3181b0

* Related to #284

jlewi added 12 commits October 11, 2024 17:23

Merge remote-tracking branch 'origin/main' into jlewi/ghostmarkup

00bf0bf

hacking on the prompt.

6e30cdb

* Update the prompt to not respond with just one code cell

74a06ee

* Force interactive to false on the returned cells as a temporary work around for #286

Bump the maximum number of characters in the response because otherwi…

d868dc5

…se the output of previous cells can cause us to exceed the limits.

Measure the time it takes to run generate as part of an experiment.

ef4f68b

Fix: #295

6213f08

Flush logs in the evaluator. Log the cellId in the event we timeout waiting for the blocklog during evaluation.

Evaluator shouldn't terminate if there is a timeout waiting for a blo…

c4341d9

…cklog. We should just keep going.

Merge remote-tracking branch 'origin/main' into jlewi/ghostmarkup

ba5932e

* Add a level 1 assertion to verify that request is non-empty

1fa86ba

#296 * Compute this at runtime not just eval * Update the log analyzer to store assertions in the traces.

Compute an evaluation report.

0547ca3

Helper function to compute percentiles.

4e22beb

Add an experiment report compute assertions and build a report.

befe5d9

jlewi mentioned this pull request Oct 15, 2024

Experiment Reports and Level 1 Assertions @ Runtime #300

Merged

jlewi added 14 commits October 15, 2024 13:11

Merge remote-tracking branch 'origin/main' into jlewi/ghostmarkup

8404558

Tidy.

502d872

Merge remote-tracking branch 'origin/main' into jlewi/ghostmarkup

c7cc13c

Update the protos.

01eb1e5

Reset maxDocChars because it increases latency negatively.

bc2421e

More files for the doc tailer PR.

dd1eb23

Merge remote-tracking branch 'origin/main' into jlewi/ghostmarkup

2fd9e09

Merge remote-tracking branch 'origin/main' into jlewi/ghostmarkup

eceeca7

Update the prompt to encourage it to reason about the output.

c6f2a06

Define a trigger enum for specifying the event that triggered the cli…

fced5fb

…ent.

Fix bug with logging the request; we need to use zapprot.

d6f7277

Start defining a K8s job and resources to automate releasing.

1cb849b

jlewi mentioned this pull request Oct 21, 2024

UI: Cell focus changed when removing ghost cells #316

Open

jlewi added 7 commits October 21, 2024 20:19

* Update agent.go to limit the number of cells that get generated

e0d65ed

* Fix the cicd/job.yaml so that it can correctly release our containers.

Create a cronjob to do the releases

4c38235

Update the README

d8eb908

Merge remote-tracking branch 'origin/main' into jlewi/ghostmarkup

1be004a

Remove job.yaml; we will use a cronjob

dcdfd1d

postprocess should merge markup blocks; fix the test.

bc89e36

Add a level1 assertion.

369fe30

jlewi changed the title ~~Start generating markup cells~~ Prompt should support generating markup cells Oct 23, 2024

jlewi marked this pull request as ready for review October 23, 2024 19:11

standard-input bot reviewed Oct 23, 2024

View reviewed changes

Fix tests.

4effe4f

jlewi enabled auto-merge (squash) October 23, 2024 19:16

jlewi added 2 commits October 23, 2024 12:20

Fix prompt test.

6f7c485

Update prompt test.

bdfe06e

jlewi merged commit b8ee786 into main Oct 23, 2024
5 checks passed

jlewi deleted the jlewi/ghostmarkup branch October 23, 2024 19:27

jlewi mentioned this pull request Oct 23, 2024

Let the AI Suggest Markup Cells #284

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt should support generating markup cells #285

Prompt should support generating markup cells #285

jlewi commented Oct 8, 2024 •

edited

Loading

netlify bot commented Oct 8, 2024 •

edited

Loading

standard-input bot left a comment

Prompt should support generating markup cells #285

Prompt should support generating markup cells #285

Conversation

jlewi commented Oct 8, 2024 • edited Loading

Suggest Markup Cells

Change PostProcessing of responses

Proto Changes

netlify bot commented Oct 8, 2024 • edited Loading

✅ Deploy Preview for foyle canceled.

standard-input bot left a comment

Choose a reason for hiding this comment

jlewi commented Oct 8, 2024 •

edited

Loading

netlify bot commented Oct 8, 2024 •

edited

Loading