-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Freezing notebooks #293
Comments
@davidbrochart Is it correct that the whole Ydoc is recreated (all cells are deleted, and created again) when the source is set? In which scenarios does this happen? Ideally, only the diff is applied, as doing this several times on large YDocs will result in decreased performance over time. |
It depends what you mean by "when the source is set"? The whole YDoc is created the first time, when loading from disk, yes. |
@cccs-nik can you provide a complete example, ideally a short Python code that generates such massive output? FYI we fixed some performance issues in:
We were working against the reproducer in: and that worked well, but if you have a reproducer which still makes it very slow maybe we can find more optimizations? |
@krassowski That one notebook I got from a user that needed 43 minutes to load had the output of "help(openai)" twice in two different cells which generated a little over 8000 lines of text output. The notebook looks like this:
I just gave the #294 fix a try and it indeed fixes both the performance issue and the text rendering issue. Merci beaucoup @davidbrochart 🎉 |
Description
Hey, I investigated this issue because since upgrading to JupyterLab 4.3.0 and adding jupyter-collaboration, a lot of our users have been complaining about killing their servers on JupyterHub. What I found is that notebooks with text outputs get exponentially slower to open. I have a pretty small ~400 KB notebook that I let run for 43 minutes before it loaded as an example.
I ended up on this line of code being extremely slow in jupyter_server_ydoc: https://github.com/jupyterlab/jupyter-collaboration/blob/ed544ba982f55d1fd4a28d5124fbddc3043bbf89/projects/jupyter-server-ydoc/jupyter_server_ydoc/rooms.py#L163
If you dig deeper you will find that it's self._ycells.extend() from jupyter_ydoc that is really slow:
jupyter_ydoc/jupyter_ydoc/ynotebook.py
Line 248 in 5de54c8
Indeed, this here is really slow when you pass it massive lists of characters that we now see for text outputs: https://github.com/jupyter-server/pycrdt/blob/f7fb3aebb76c55aff3c82f5008fd135b87de488f/python/pycrdt/_array.py#L100
An easy workaround is commenting out this optimization in create_ycell():
jupyter_ydoc/jupyter_ydoc/ynotebook.py
Lines 168 to 169 in 5de54c8
My problematic notebooks now load in a reasonable amount of time but this is just an untested workaround. I'm sure there is a good solution for the problem.
Reproduce
Expected behavior
For notebooks to load in a reasonable amount of time even with big text outputs.
Context
@davidbrochart I think this issue might be the most relevant to you as you worked on the new stream outputs.
The text was updated successfully, but these errors were encountered: