Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design CLI API #4

Open
rth opened this issue Jun 23, 2023 · 6 comments
Open

Design CLI API #4

rth opened this issue Jun 23, 2023 · 6 comments

Comments

@rth
Copy link
Member

rth commented Jun 23, 2023

We need to design the CLI API for this package.

In pyodide/pyodide#3573 (comment) @bollwyvl proposed,

$> pyodide-index path/to/wheels/folder
Wrote 200 packages to path/to/wheels/folder/repodata.json

and I agree this is the direction. Though given the current name of this package, it would also be more logical to call it pyodide lock IMO.

Also we need to keep in mind that the resulting lockfile would need to include information about the unvendored stdlib modules (and Python version). So it needs to have access to the original pyodide-lock.json (either via the pyodide version and looking at the CDN or by providing a path/URL to it). The difference with respect to conda index producing the repodata.json is that there,

  • they can have multiple Python versions
  • as far as I understand, they don't care that the included files may have some conflicts. The dependency resolution would really happen only at runtime when running conda install and combining indexes from different channels.. While in our case, by design once pyodide-lock.json is generated the dependency resolution is already done. So any package would be guaranteed to be installable with trivial dependency resolution (there is a single version for each package, and we can just ignore versions).

There are two use cases,

  1. Adding/updating packages with actual files being stored on some remote CDN. In this case, extra entries in the pyodide-lock.json don't matter, since they would only load if explicitly imported, and we don't necessarily need to download all the included files locally IMO. Here I was thinking of taking something like a requirements.in as input (as in pip-tools) which would compute a consistent dependency graph merging the original pyodide-lock.json with the requirements in requirements.in and combine both (not easy).
  2. Including only a subset of packages for a given application, and ship them alongside pyodide-lock.json for reproducibility. This is closer to the use case of https://github.com/pyodide/pyodide-pack: BTW I'm changing a bit the focus of that package away from rather an experimental module stripping via runtime detection to any kind of package/wheel minification. So the wheel files would be modified by that tool, but the end pyodide-lock.json would be still generated by this project.
    The challenge with this use case is that even given a list of wheels in some folder, we still need to verify that there are no missing requirements and that the dependency graph is consistent, so we need a resolver that would understand the wasm platform for finding compatible wheels.

Anyway, it's still early day, this needs more discussion. My current idea is to iterate on the implementation that would work well in practice for these use cases, while only pushing alpha releases to PyPI. Any API in this package is considered unstable and can be completely changed.

Please let me know if you have any other ideas about how this should work.

@hoodmane @ryanking13

@ryanking13
Copy link
Member

Also we need to keep in mind that the resulting lockfile would need to include information about the unvendored stdlib modules (and Python version).

This is something that continues to bug me: there are modules that are required to be included, and because of that, when creating a lockfile externally, there is a dependency on the original lockfile or Pyodide.

I'd prefer the second option, to create a separate lockfile from the original pyodide-lock file, but I don't have any concrete ideas, and I suspect that this would cause version conflicts between duplicate packages.

@bollwyvl
Copy link
Contributor

The way the in-flight jupyterlite PR works is by:

  • generating partial lockfiles
    • these wouldn't be sufficient to run anything
  • layering them, by name, on top of the as-loaded runtime with whatever index is hosted with it
    • this could be hoisted/normalized in pyodide's initializer with e.g. extraLockUrls: []
      • this could be non-destructive, which might be better, and potentially support "punch-out" by setting a name to a null (ick!)

The concrete things this solves there:

  • auto-install-on-import
  • replacing packages in the pyodide stdlib
    • IPython has an optional dependency on jedi/parso (but still downloads them)
      • we haven't been able to make jedi work well enough in jupyterlite to use, so would rather replace it with dummy shims to save a couple megabytes on the wire
    • per-PR docs builds of packages that are in the pyodide stdlib and really just want one different

no missing requirements and that the dependency graph is consistent,

Nobody likes building another package manager, of course. This is a place where a JSON schema can't do the job, but the more declarative options are... heavy. In the above PR, i opted for is there a missing named dependency, but fully validating the whole smorgasbord of "semver" operators would all but certainly entail another dependency, e.g. dparse.

@rth
Copy link
Member Author

rth commented Jul 4, 2023

Thanks for the feedback @bollwyvl !

I'm opened more focused follow up issues, where each potential approach can be discussed in more details, so we can choose which way we go,

@joemarshall
Copy link
Contributor

As an absolutely minimal first step, it would be good to just enable dependency fixup, as it is done in micropip now - so that if you had created a folder of all the wheels and dependencies you want in the json, it can add them to the json with correct dependencies.

That way it would fit with the existing pyodide-build support for building modules with dependencies included. Gives an initial workflow for making pyodide-lock.json for arbitrary modules and deps with current tools.

@rth
Copy link
Member Author

rth commented Sep 21, 2023

Yes, I agree we could start with that.

@joemarshall
Copy link
Contributor

Take a look at this PR - #20

With that PR, If you build a bunch of wheels using pyodide build --with-dependencies, then make a lockfile with pyodide lockfile add-wheels dist/*.whl, you have a lockfile which should work nicely in pyodide (with dependencies resolving nicely etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants