Skip to content
This repository has been archived by the owner on Aug 27, 2023. It is now read-only.

413: Entity too large #252

Open
tomplex opened this issue Jul 14, 2020 · 8 comments
Open

413: Entity too large #252

tomplex opened this issue Jul 14, 2020 · 8 comments

Comments

@tomplex
Copy link

tomplex commented Jul 14, 2020

Hello,

What's the best way to increase the max size of uploaded packages? Should it be a change to the UWSGI config (e.g. limit-post), or something else?

Thanks!

@stevearc
Copy link
Owner

uWSGI is the first place I would try. Out of curiosity, how large is the package you're trying to upload? I did some simple tests and wasn't able to get a 413

@leonoverweel
Copy link

I've also run into this with a ~50mb package. In my case it looks like the problem is a 32mb hard limit on Google Cloud Run requests.

@leonoverweel
Copy link

We got in touch with GCP support about this; for our near-term needs we'll probably end up hosting our PyPI cloud container in our k8s cluster instead of Cloud Run.

However, this part of their reply may be interesting for PyPI Cloud maintainers:

Cloud Run + signed URLs:
Now as to 32MB limitation with Cloud Run. I believe that you wouldn't hit this when downloading packages, as PyPI Cloud will not serve these packages directly, but will generate signed URLs and the download is directly from GCS (unless the stream_files option is turned on [2]).

It would make sense to use the same pattern for file uploading as well - I assume this is where you're hitting the 32MB limit. Signed URLs do support this [3] (and AWS has a similar concept with S3 pre-signed URLs). However, this is not currently supported and would require code modification of the PyPI Cloud project.

[2] https://pypicloud.readthedocs.io/en/latest/topics/configuration.html#pypi-stream-files
[3] https://cloud.google.com/storage/docs/access-control/signed-urls

Is this signed URLs approach something that could come to PyPI cloud in the future? It'd probably somehow have to integrate with the twine upload util, so I'm not sure if it'd be practical to implement.

@lgeiger
Copy link

lgeiger commented Nov 18, 2020

Is this signed URLs approach something that could come to PyPI cloud in the future?

It looks like for downloading pypicloud already uses signed URLs which could in theory be used for uploads as well.

However I am not sure if this would require large changes to pypicloud itself to integrated this. @stevearc what do you think?

@stevearc
Copy link
Owner

I'm going to assume that making changes to twine or setup.py upload is not on the table. I see two main options:

  1. Create a small command line utility that knows how to talk to pypicloud and upload the results into GCS or S3 or whatever your blob storage is. This is relatively straightforward, but there is some complexity around data consistency. Pypicloud is going to be managing the cache, so it has to decide when to consider the package to "exist". It could optimistically put it in the cache as soon as the CLI requests a signed url, but then it'll have an invalid entry for a short time (and possibly forever if the upload fails). It could require the CLI to report back when the upload is completed, probably the best option though there's also the risk of that piece failing. Or the signed url request could set up some sort of short-lived polling job that updates the cache once it sees the object exists in the blob store.
  2. Make the file upload endpoint redirect to the blob storage url. This would be much easier to use, but I'm not entirely sure it'll work (I've had trouble with S3 signed url redirects in the past). The main problem with this (assuming it works) is the same as above, except with no option to have the client check back in and confirm after the upload succeeds.

Neither one of these would require rewriting much of pypicloud, but it could involve the addition of some potentially complex logic.

@leonoverweel
Copy link

We ended up going for something close to 1. We turned off storage.prepend_hash and now use gsutil to push our wheels there manually and add the required (name, version) metadata; then we refresh the caches using the button in the admin UI (btw, is there a REST call to do this?) and it all works. This'll cover our needs for now - it's not a module we update a lot.

@stevearc
Copy link
Owner

Oof, rebuilding the whole cache is a lot of unnecessary work for one package, but if you're not doing it often I guess that's okay. There is an endpoint for this: https://pypicloud.readthedocs.io/en/latest/topics/api.html#get-admin-rebuild

@jonjesse
Copy link

I ran into the same issue, but it turned out to be an ingress issue.
If you use ingress like nginx, make sure that proxy-body size is large enough, see
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#custom-max-body-size
it's an annotation setting:
nginx.ingress.kubernetes.io/proxy-body-size: 100m

if you use another software LB look for the proxy-body-size setting or something similar.
hope this helps

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants