-
-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update /generate
to not split classes & functions across cells
#1158
base: main
Are you sure you want to change the base?
Conversation
/generate
to ensure no splitting of class or function code across cells in generated notebooks/generate
to not split classes & functions across cells
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srdas Awesome work! Love the creativity applied here. Few minor comments below.
@@ -212,6 +212,15 @@ def create_notebook(outline): | |||
nb["cells"].append(nbf.new_markdown_cell("## " + section["title"])) | |||
for code_block in section["code"].split("\n\n"): | |||
nb["cells"].append(nbf.new_code_cell(code_block)) | |||
|
|||
# Post process notebook for hanging cells: merge hanging cell with the previous cell | |||
nb_cells = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth renaming this variable to avoid confusion with nb["cells"]
.
nb_cells = [] | |
merged_cells = [] |
# Post process notebook for hanging cells: merge hanging cell with the previous cell | ||
nb_cells = [] | ||
for cell in nb["cells"]: | ||
if (cell["cell_type"] == "code") and (cell["source"][0] == " "): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The True
branch requires nb_cells[-1]
to exist, otherwise it will throw an exception. The condition cell["source"][0] == " "
also requires cell["source"]
to have at least 1 char, which is not guaranteed.
The condition can be updated to avoid these edge cases:
if (cell["cell_type"] == "code") and (cell["source"][0] == " "): | |
follows_code_cell = nb_cells and nb_cells[-1]["cell_type"] == "code" | |
is_incomplete = cell["cell_type"] == "code" and cell["source"].startswith(" ") | |
if follows_code_cell and is_incomplete: |
Note that nb_cells
should be renamed to merged_cells
if the previous suggestion is accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made all suggested changes! Thanks for the help.
@srdas One other question: with this change, are you able to run every notebook generated via |
for more information, see https://pre-commit.ci
Fixes #1111
/generate
will sometimes split a class or a function across multiple code cells. After tracing the code ingenerate.py
the error arises from unanticipated LLM behavior in the curation of the resultant Jupyter notebook in the creation of theoutline
object. This is best handled by post-processing the notebook to merge "hanging" code cells with the preceding code cell.The error looks like this:
After the fix, the error is no longer present. See this example:
Here is another example of the rectified notebook generation:
if necessary to check the log, add print statements in lines 236, 240, 245 to the code as shown below.
The before and after versions of the logged notebook can be compared for lines with
***** CELL 1 *****
as shown here: