Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filename truncated during https tar stream #5450

Open
WH-2099 opened this issue Jul 25, 2024 · 17 comments
Open

Filename truncated during https tar stream #5450

WH-2099 opened this issue Jul 25, 2024 · 17 comments
Labels
bug Something isn't working compatibility Compatibility with a specification or another tool

Comments

@WH-2099
Copy link

WH-2099 commented Jul 25, 2024

When a specific https index-url is used, a specific filename is truncated.

Here's a summary of a very stable reproduction I've made.

uv 0.2.29 (39be71f40 2024-07-24)

uv cache clean
uv pip -v install --reinstall aliyun-python-sdk-core==2.15.1 -i https://pypi.tuna.tsinghua.edu.cn/simple/ 2>&1 | tee log

# The `uv cache clean` `tee log` `-v` `--reinstall` 
# are just to make it easier to reproduce the scene
# they are not directly related to the problem.
# Here's the problem file
cache_dir=$(dirname $(grep -oP 'build_wheel\("\K[^"]+' log))
echo $cache_dir
ls -al $cache_dir/aliyun-python-sdk-core-2.15.1.tar.gz/aliyunsdkcore/vendored/requests/packages/urllib3/contrib/_appengine_en

This filename here should have been _appengine_environ.py, but is now truncated to _appengine_en.

Specific file content can be reviewed by manually downloading the corresponding source file.
https://pypi.tuna.tsinghua.edu.cn/packages/3a/e6/f579e8a5e26ef1066f6fb11074cedc9f668cb5f722c85cf7adc0f7e2e23e/aliyun-python-sdk-core-2.15.1.tar.gz

If you switch to using the http version of this mirror source http://pypi.tuna.tsinghua.edu.cn/simple/, this problem does not occur.

If you switch to any other index-url, such as https://pypi.org/simple/ you won't have this problem.

@WH-2099
Copy link
Author

WH-2099 commented Jul 25, 2024

I think this is a bug of concern because it causes silent filename changes and ultimately missing files for installed packages.

Considering that longer index-url is not uncommon in private environments, this could have an impact on production level environments and be very difficult to track down. (I'm actually an example of this myself 🤣)

I'll keep following up on this, so feel free to contact me if there's anything I can do to help!

@charliermarsh
Copy link
Member

Is your filesystem silently truncating filenames that exceed a certain length?

@WH-2099
Copy link
Author

WH-2099 commented Jul 25, 2024

Is your filesystem silently truncating filenames that exceed a certain length?

Pretty sure it's not.

@charliermarsh
Copy link
Member

I'll take a look.

@charliermarsh
Copy link
Member

This appears to be a problem during the unpacking of the tar file. We see that truncated filename as soon as we ask the tar crate for the entries (aliyun-python-sdk-core-2.15.1/aliyunsdkcore/vendored/requests/packages/urllib3/contrib/_appengine_en).

I think there must be something going wrong in the index itself, honestly. If I download the file to disk, then install it, it works correctly. Similarly, if I install from PyPI, I get the right result (and I confirmed that the zip files are identical between the two indexes). But if I stream and uncompress from the aliyun index, I get the wrong result. It's really hard for me to find any root cause for that. My guess is that there's an incorrect header somewhere that's causing the streamed decompression to fail?

@charliermarsh charliermarsh added the compatibility Compatibility with a specification or another tool label Jul 25, 2024
@charliermarsh charliermarsh self-assigned this Jul 25, 2024
@WH-2099
Copy link
Author

WH-2099 commented Jul 26, 2024

Thanks for following up, I will modify the problem description accordingly.

@WH-2099
Copy link
Author

WH-2099 commented Jul 26, 2024

I also found that switching to the http version that uses this mirror source does not cause this problem.

This also meant having to decrypt tls traffic, I tried the traditional environment variable SSLKEYLOGFILE to no avail.

I'm still new to rust, can you point me in the right direction as to how to get the tls key here in order to decrypt the session traffic for locating the problem?

@WH-2099 WH-2099 changed the title Long path was unexpectedly truncated Filename truncated during https tar stream Jul 26, 2024
@charliermarsh charliermarsh removed their assignment Aug 10, 2024
@failable
Copy link

@WH-2099 Have you figured out a way to fix this issue? I am facing this issue when using the Aliyun OSS.

@zanieb zanieb added the network Network connectivity e.g. proxies, DNS, and SSL label Aug 26, 2024
@RazerM
Copy link

RazerM commented Nov 4, 2024

I have a case where uv build created a package where the .tar.gz file is all fine, but one of the filenames in the .whl file is truncated. A retry of the CI job did not reproduce the issue.

uv 0.4.29

It's not a package I can share but I can hopefully give some details:

  • There are longer filenames in the archive which weren't truncated
  • The file contents are correct

The output of uv build looks like this:

$ uv build
Building source distribution...
...
copying mypkg1234/migrations/versions/31320cf96165_lorem_ipsum_dolor_sit_amet_con.py -> mypkg1234-24.11.2.dev0+743a1c8/mypkg1234/migrations/versions
...
Creating tar archive
Building wheel from source distribution...
...
copying mypkg1234/migrations/versions/31320cf96165_lorem_ipsum_dolor_sit_amet -> build/lib/mypkg1234/migrations/versions
# ---------------------------------------------------------------- truncated^

(the obfuscated filenames are length accurate in case that's relevant)

I came across alexcrichton/tar-rs#369 which might be relevant? My archive contains 720 files with a path longer than 100 characters, only one of which was truncated.

@RazerM
Copy link

RazerM commented Nov 4, 2024

uv build --wheel mysdist.tar.gz consistently reproduces, it's not the tar-rs issue I linked. I still need to diff the sdists but the issue when it truncates is a PAX header for the full path not being read (I can see it's there in the .tar file but krata-tokio-tar doesn't load it).

(I'm using a clone of krata-tokio-tar and building uv with it to debug)

@RazerM
Copy link

RazerM commented Nov 5, 2024

Ok found the problem

https://github.com/edera-dev/tokio-tar/blob/4ee357285b5053e6bfada7f117e530b4da94b74a/src/archive.rs#L317

            if is_recognized_header && entry.header().entry_type().is_pax_local_extensions() {
                if self.pax_extensions.is_some() {
                    return Poll::Ready(Some(Err(other(
                        "two pax extensions entries describing \
                         the same member",
                    ))));
                }
                let mut ef = EntryFields::from(entry);
                let val = ready_err!(Pin::new(&mut ef).poll_read_all(cx));
                self.pax_extensions = Some(val);
                continue;
            }

if Pin::new(&mut ef).poll_read_all(cx) is Poll::Pending then ready_err! returns it, so the Pax extension is lost. The same would apply to a pending poll that occurs while a longlink or longname is being prepared. When poll_next is called again the next entry header is parsed.

I assume this is much more likely to happen if streaming from the network to unpack a tar than in uv build, but it can happen. I assume I managed to get a pax extension header split across the default chunk boundary.

@charliermarsh do you happen to know the maintainer of this fork? There's only an issue tracker on the original repo.

@RazerM
Copy link

RazerM commented Nov 9, 2024

And here's a direct reproduction in uv:

git clone https://github.com/RazerM/uv-cannot-install-this.git
cd uv-cannot-install-this
uv build --sdist
uv build --wheel dist/uv_cannot_install_this-1.0.tar.gz
uvx --from=dist/uv_cannot_install_this-1.0-py3-none-any.whl --python 3.12 --refresh uv-cannot-install-this

(these steps are a wee bit more explicit than they need to be, the important part is that uv builds the wheel from the sdist, since it's the tar extraction that's broken).

Missing: data/PmwTDwPvyfFkMzuJjMYZGwVcaFXUvXXwGcRIgaLroLunLYwqTHqwsDyuxyrMwlClMPzIYvVWEqCDQlCLrGeVrmiNgsQzVsMohMbx
Found:   data/PmwTDwPvyfFkMzuJjMYZGwVcaFXUvXXwGcRIgaLro

Missing: data/BqMlnKrLJFRUBYGNkhkzSwTbVJCIUoiHEnzKiKLADIHROEqVxEWYAsCusUAJKcXYLBZmxYhKncZsXLEjjhctJVXbcvBuSYjkiEnu
Found:   data/BqMlnKrLJFRUBYGNkhkzSwTbVJCIUoiHEnzKiKLAD

Missing: data/WRDWdZSTLeOixzwhUBDuWyktTYmXLkDuJpJvPxxWxjiHuwqxovuwRwgQwxpxVBTgvUkvwnBDEuRTXJpaJsEWwLwlYTGaqRBbFbZN
Found:   data/WRDWdZSTLeOixzwhUBDuWyktTYmXLkDuJpJvPxxWx

Missing: data/cvpYHXtnEbfJmnflwYwRUneJnWoAswvXjJfxxnoQsRoWLbmjhGUDDxlXnPGTRLgXxXXlehxbDHmPzOoqKQxQAshPTTFtstwVaDOz
Found:   data/cvpYHXtnEbfJmnflwYwRUneJnWoAswvXjJfxxnoQs

Missing: data/jhDgXQPnsxYXugmltiBRzrwPualmsjTEjyCQPIXNoEYEOKBaCGtobkgisEyGkFXAovuciJOegxGjwKScJuVujEWntxTdPUeFWKiC
Found:   data/jhDgXQPnsxYXugmltiBRzrwPualmsjTEjyCQPIXNo

Missing: data/IpjXpmbssCRJYzwCMUFAXrQJKFGUZjlBxYDlOnxclVVOBfxOfDDeTcVlpHQocAMejwdRourDpgcxEDROcgYtpRcGNxwjJpKfMQIg
Found:   data/IpjXpmbssCRJYzwCMUFAXrQJKFGUZjlBxYDlOnxcl

Missing: data/tEjeOXVVZtJYcOdIjItoRTYnCFDcjAdoMJbholfRiRkesWDLYxGbQHRwyzIJFaDxdfleUsDPKftreQLyArliSyTmPqMJJKygqivC
Found:   data/tEjeOXVVZtJYcOdIjItoRTYnCFDcjAdoMJbholfRi

9993 files ok, 7 missing

@charliermarsh
Copy link
Member

Thanks for digging in here. I don't know the maintainer of the fork, no... But we could try PRing or a fix? Alternatively, they have some contact info on their GitHub profile.

@RazerM
Copy link

RazerM commented Dec 7, 2024

this issue can have the network label removed and bug label added

@charliermarsh charliermarsh added bug Something isn't working and removed network Network connectivity e.g. proxies, DNS, and SSL labels Dec 7, 2024
@charliermarsh
Copy link
Member

Honestly might need to fork it ourselves.

@charliermarsh
Copy link
Member

You can PR your change here: https://github.com/astral-sh/tokio-tar

@effigies
Copy link

Hi all, this is just a note that I've got a package I can't build with uv because data filenames get truncated between the sdist and the wheel. If there's anything I can do to help this along, with testing for example, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compatibility Compatibility with a specification or another tool
Projects
None yet
Development

No branches or pull requests

6 participants