Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt should support generating markup cells #285

Merged
merged 37 commits into from
Oct 23, 2024
Merged

Prompt should support generating markup cells #285

merged 37 commits into from
Oct 23, 2024

Conversation

jlewi
Copy link
Owner

@jlewi jlewi commented Oct 8, 2024

Suggest Markup Cells

  • Now that the frontend can render markup as ghost cells we want the agent to start generating them

  • This will allow the AI to

    1. Reason about the outputs of commands - e.g. interpret whether the output supports or refutes a hypothesis
    2. Suggest markup cells containing plans
  • Related to Let the AI Suggest Markup Cells #284

  • Don't restrict the response to a single code block.

    • Now that we can render markup cells as ghost cells
    • We should allow multi-block responses that can include markup cells
  • Remove the hack which only generated completions if the current cell was a markup cell. We should generate completions even if the cell is code or output cell

    • This was a cost saving measure. However, switching to gpt4o-mini should have sufficiently reduced costs that we can afford to generate completions on all cells.

Change PostProcessing of responses

  • We no longer limit the response to a single code block.
  • We allow at most 2 blocks; one markup and one code cell
  • We do this because from a UX experience generating multiple cells is confusing
  • If there are multiple markup blocks in sequence we merge them into one block
    • This is less confusing for users
    • I believe the multiple cells is an artifact of Runme parses markup into blocks.
  • Drop any cells after the first code block in the response

Proto Changes

  • Add an enum proto to be used on the front end to report the trigger for the completion. This will help us troubleshoot and detect issues in the frontend logic for triggering.

Copy link

netlify bot commented Oct 8, 2024

Deploy Preview for foyle canceled.

Name Link
🔨 Latest commit bdfe06e
🔍 Latest deploy log https://app.netlify.com/sites/foyle/deploys/67194ce792f78400086507d2

jlewi added a commit that referenced this pull request Oct 15, 2024
# Experiment Report

After running an evaluation experiment, we compute a report that
contains the key metrics we want to track. To start with this is

* Number of cell match results
* Number of errors and examples
* Generate latency measured as percentiles
* Level1 assertion stats

# Level 1 Assertion stats

* Add a level 1 assertion to test whether the document is zero.
* I believe I observed this started happening when we included a fix to
outputs not being included (#286) in #285.
* I think the problem is the cell outputs could be very long and this
could end up eating all the available context buffer

# Reintegrate Level 1 Assertions Into Evaluation
* Fix #261 
* We start computing level 1 assertions at RunTime so that they are
available in production and evaluation
* Level1 assertions are computed and then logged
* Our Analyzer pipeline reads the assertions from the logs and adds them
to the trace
* Our evaluation report accumulates assertion statistics and reports
them
* During evaluation we are seeing ocassional timeouts on the server due to
  the HTTP read/write timeout occuring. It looks like this might happen
  because ChatGPT takes a really long time to respond occassionally.

* Update StreamGenerate and GenerateCells to return DeadlineExceeded to
  indicate a server timeout

* Implement a unaryinterceptor to automatically retry requests
  based on the status code.

* The retry is pretty simplistic; its a fixed backoff.
* In 14 of our 424 examples in evaluation the input document sent to the model
  ends up being the empty string

* This is the result of how our doc tailer works. Our doc tailer imposes a length
  cap on the tail of the document. There was a bug in the tailer where
  if the last cell in the document exceeded the cap (currently 1110 characters) then
  an empty string would be returned.

* This PR fixes that. If the last cell exceeds the length then we take the tail of that cell.

* This PR also checks in completer that the tail of the document is non empty; if its not empty
  then we fail the completion rather than continuing to generate the completion.

* Fix #305
@jlewi jlewi changed the title Start generating markup cells Prompt should support generating markup cells Oct 23, 2024
@jlewi jlewi marked this pull request as ready for review October 23, 2024 19:11
Copy link
Contributor

@standard-input standard-input bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues flagged.
Standard Input can make mistakes. Check important info.

@jlewi jlewi enabled auto-merge (squash) October 23, 2024 19:16
@jlewi jlewi merged commit b8ee786 into main Oct 23, 2024
5 checks passed
@jlewi jlewi deleted the jlewi/ghostmarkup branch October 23, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant