Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freezing notebooks #293

Closed
cccs-nik opened this issue Nov 29, 2024 · 5 comments · Fixed by #294
Closed

Freezing notebooks #293

cccs-nik opened this issue Nov 29, 2024 · 5 comments · Fixed by #294
Labels
bug Something isn't working

Comments

@cccs-nik
Copy link

Description

Hey, I investigated this issue because since upgrading to JupyterLab 4.3.0 and adding jupyter-collaboration, a lot of our users have been complaining about killing their servers on JupyterHub. What I found is that notebooks with text outputs get exponentially slower to open. I have a pretty small ~400 KB notebook that I let run for 43 minutes before it loaded as an example.

I ended up on this line of code being extremely slow in jupyter_server_ydoc: https://github.com/jupyterlab/jupyter-collaboration/blob/ed544ba982f55d1fd4a28d5124fbddc3043bbf89/projects/jupyter-server-ydoc/jupyter_server_ydoc/rooms.py#L163

If you dig deeper you will find that it's self._ycells.extend() from jupyter_ydoc that is really slow:

self._ycells.extend([self.create_ycell(cell) for cell in cells])

Indeed, this here is really slow when you pass it massive lists of characters that we now see for text outputs: https://github.com/jupyter-server/pycrdt/blob/f7fb3aebb76c55aff3c82f5008fd135b87de488f/python/pycrdt/_array.py#L100

An easy workaround is commenting out this optimization in create_ycell():

if output.get("output_type") == "stream":
output["text"] = Array(output.get("text", []))

My problematic notebooks now load in a reasonable amount of time but this is just an untested workaround. I'm sure there is a good solution for the problem.

Reproduce

  1. Create a notebook with massive text output(s).
  2. Restart JupyterLab.
  3. Open the notebook.
  4. JupyterLab will be frozen.

Expected behavior

For notebooks to load in a reasonable amount of time even with big text outputs.

Context

@davidbrochart I think this issue might be the most relevant to you as you worked on the new stream outputs.

@cccs-nik cccs-nik added the bug Something isn't working label Nov 29, 2024
@dmonad
Copy link

dmonad commented Dec 1, 2024

@davidbrochart Is it correct that the whole Ydoc is recreated (all cells are deleted, and created again) when the source is set? In which scenarios does this happen? Ideally, only the diff is applied, as doing this several times on large YDocs will result in decreased performance over time.

@davidbrochart
Copy link
Collaborator

davidbrochart commented Dec 1, 2024

It depends what you mean by "when the source is set"? The whole YDoc is created the first time, when loading from disk, yes.
But what @cccs-nik is mentioning is that we recently changed the way cell stream outputs are handled. These outputs consist of text, that represents e.g. the stdout stream that is the result of a print() function in Python. Previously, the text in the YCell was not a YText, it was just a plain text that we were replacing with the new text. This was very inefficient because the stream is mostly a concatenation of text, so we were throwing away the whole text and replacing it with a new one, even if the diff was minimal. So we changed that and we now have a YText, and we can now concatenate each new item.
I guess what @cccs-nik is reporting is that inserting a big chunk of text in a YText is very slow.

@krassowski
Copy link
Collaborator

Create a notebook with massive text output(s).

@cccs-nik can you provide a complete example, ideally a short Python code that generates such massive output? FYI we fixed some performance issues in:

We were working against the reproducer in:

and that worked well, but if you have a reproducer which still makes it very slow maybe we can find more optimizations?

@davidbrochart
Copy link
Collaborator

I opened #294 which should fix this issue, thanks for reporting @cccs-nik.

@cccs-nik
Copy link
Author

cccs-nik commented Dec 2, 2024

@krassowski That one notebook I got from a user that needed 43 minutes to load had the output of "help(openai)" twice in two different cells which generated a little over 8000 lines of text output. The notebook looks like this:

   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on package openai:\n",
      "\n",
      "NAME\n",
      "    openai\n",
      "\n",
      "DESCRIPTION\n",
      "    # OpenAI Python bindings.\n",
      "    #\n",
      "    # Originally forked from the MIT-licensed Stripe Python bindings.\n",
      "\n",
      "PACKAGE CONTENTS\n",
      "    _openai_scripts\n",
      "    api_requestor\n",
      "    api_resources (package)\n",
      "    cli\n",
      "    datalib (package)\n",
      "    embeddings_utils\n",
      "    error\n",
      "    object_classes\n",
      "    openai_object\n",
      "    openai_response\n",
      "    tests (package)\n",
      "    upload_progress\n",
      "    util\n",
      "    validators\n",
      "    version\n",
      "    wandb_logger\n",
      "\n",
      "CLASSES\n",
      "    builtins.Exception(builtins.BaseException)\n",
      "        openai.error.OpenAIError\n",
      "            openai.error.APIError\n",
      "            openai.error.InvalidRequestError\n",
      "    openai.api_resources.abstract.api_resource.APIResource(openai.openai_object.OpenAIObject)\n",
      "        openai.api_resources.audio.Audio\n",
      "        openai.api_resources.image.Image\n",
      "    openai.api_resources.abstract.createable_api_resource.CreateableAPIResource(openai.api_resources.abstract.api_resource.APIResource)\n",

I just gave the #294 fix a try and it indeed fixes both the performance issue and the text rendering issue. Merci beaucoup @davidbrochart 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants