Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bundling of .so files #34

Merged
merged 8 commits into from
Oct 8, 2023
Merged

Fix bundling of .so files #34

merged 8 commits into from
Oct 8, 2023

Conversation

rth
Copy link
Member

@rth rth commented Oct 8, 2023

This fixes the bundling of .so files that got broken once pyodide switched to using loadDynlib, previously we were hooking into emscripten's FS.findObject but this no longer works with recent pyodide versions.

The logic added here is,

  1. store the list of all locally accessed symbols (LDSO) when running the application in the sandbox (thanks to Hood's suggestion in Record a list of accessed symbols #32)
  2. monekypatch loadDynamicLibrary to know the order in which .so files were loaded (so we can preserve it).
    • Note that monkeypatching pyodide's loadDynlib to record the arguments with which it was called doesn't work. This means that I currently don't record searchDirs passed to loadDynlib. It's a known issue that should be fixed later. At least this PR fixes the scipy example, so it's already an improvement.
  3. when deciding whether to include an .so, include those that,
  • have symbols that were accessed
  • or are globally loaded. We could do a more fine-grained inclusion for globally loaded dynamic libraries later.
  1. Finally, load the .so with loadDynlib in the same order.
    • currently I'm nor caching the FS.readFile as we are doing in Pyodide as I'm not sure how much it matters, but maybe I should.

With the scipy example,

$ pyodide pack examples/scikit-learn/app.py --write-debug-map

Detected 7 dependencies with a total size of 22.47 MB  (uncompressed: 79.21 MB)
Total initial size (stdlib + dependencies): 24.72 MB


                                       Packing..                                        
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ No ┃ Package                        ┃ All files ┃ .so libs ┃    Size (MB) ┃ Reduction ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│  0 │ stdlib                         │ 547 → 180 │          │  2.25 → 0.98 │    56.4 % │
│  1 │ distutils-1.0.0.zip            │    53 → 0 │    0 → 0 │  0.18 → 0.00 │   100.0 % │
│  2 │ joblib-1.3.2-py3-none-any.whl  │   52 → 17 │    0 → 0 │  0.17 → 0.08 │    53.9 % │
│  3 │ numpy-1.25.2-cp311-cp311-emsc… │ 430 → 111 │  19 → 14 │  3.06 → 2.36 │    22.8 % │
│  4 │ openblas-0.3.23.zip            │     1 → 1 │    1 → 1 │  1.84 → 1.84 │     0.0 % │
│  5 │ scikit_learn-1.3.0-cp311-cp31… │ 383 → 167 │  68 → 42 │  5.43 → 3.02 │    44.3 % │
│  6 │ scipy-1.11.1-cp311-cp311-emsc… │ 708 → 430 │ 118 → 89 │ 11.77 → 7.82 │    33.5 % │
│  7 │ threadpoolctl-3.2.0-py3-none-… │     5 → 1 │    0 → 0 │  0.01 → 0.01 │    31.2 % │
└────┴────────────────────────────────┴───────────┴──────────┴──────────────┴───────────┘
[...]

Total output size (stdlib + packages): 16.35 MB (33.9% reduction)

It's not too bad but also not exceptional. The next step would be to strip .so of unused symbols (though that requires good coverage in the sample code). In particular, this example is a random forest example and I think it doesn't need any of the BLAS functions (and likely very little from scipy).

cc @ryanking13

@ryanking13
Copy link
Member

currently I'm nor caching the FS.readFile as we are doing in Pyodide as I'm not sure how much it matters, but maybe I should.

This is for runtime performance so I guess it doesn't matter for pyodide-pack IIUC?

@rth
Copy link
Member Author

rth commented Oct 8, 2023

This is for runtime performance so I guess it doesn't matter for pyodide-pack IIUC?

Thanks for the feedback. Well, here it's used in the function that loads all the .so at runtime, once the user created a bundle and wants to use it. So runtime performance would still matter. I'll add it in a follow up PR.

@rth rth merged commit 85630db into pyodide:main Oct 8, 2023
4 checks passed
@rth rth deleted the fix-dynamic-linking branch October 8, 2023 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants