Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[24.0] Fix Archive header encoding #18583

Merged
merged 1 commit into from
Jul 23, 2024

Conversation

arash77
Copy link
Collaborator

@arash77 arash77 commented Jul 22, 2024

Fix the encoding of the filename in the archive header when downloading a collection with non-English characters.
Fixes #18508.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@galaxyproject-sentryintegration

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: lib/galaxy/util/zipstream.py

Function Unhandled Issue
response OSError: [Errno 116] Stale file handle /api/datas...
Event Count: 1

Did you find this useful? React with a 👍 or 👎

@@ -41,7 +42,7 @@ def response(self) -> Iterator[bytes]:
def get_headers(self) -> Dict[str, str]:
headers = {}
if self.archive_name:
headers["Content-Disposition"] = f'attachment; filename="{self.archive_name}.zip"'
headers["Content-Disposition"] = f'attachment; filename="{decodify(self.archive_name)}.zip"'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead just do the inline call ? Looking at this I know exactly what it's going to do, looking at a function like decodify I don't.

Suggested change
headers["Content-Disposition"] = f'attachment; filename="{decodify(self.archive_name)}.zip"'
headers["Content-Disposition"] = f'attachment; filename="{self.archive_name.encode('latin-1', 'replace').decode('latin-1')}.zip"'

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this problem does not just belong to this specific purpose (#18584), we could use it in other places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're fixing a specific bug here, with a very specific standard spec stating we need latin-1 characters in headers. This is not liekly to be the case for #18584. If it turns out that all of this is needed to accomplish #18584 I don't mind, but that should go to another PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will make this PR specific to the issue we are addressing.
However, why are we using 'latin-1' encoding? It might not display the correct characters when downloading.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1194,6 +1194,25 @@ def unicodify(
return value


def decodify(value, encoding="utf-8", encoding_error="replace", decoding="latin-1", decoding_error="replace"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do decide for this I would narrow the type for value to string and really only do value.encode(encoding).decode(encoding). The rest we can do when and if we need it. I would not like to replicate unicodify logic here.

@mvdbeek
Copy link
Member

mvdbeek commented Jul 22, 2024

As a bugfix this should go to 24.0

@arash77 arash77 changed the base branch from dev to release_24.0 July 22, 2024 15:58
@arash77 arash77 changed the base branch from release_24.0 to dev July 22, 2024 15:58
@arash77 arash77 changed the base branch from dev to release_24.0 July 22, 2024 16:00
@arash77 arash77 changed the base branch from release_24.0 to dev July 22, 2024 16:01
@arash77 arash77 changed the base branch from dev to release_24.0 July 22, 2024 16:07
@arash77 arash77 force-pushed the decode-archive-header branch from 00cb14c to 5beb876 Compare July 22, 2024 16:07
Copy link
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@mvdbeek mvdbeek merged commit 9a7d1f2 into galaxyproject:release_24.0 Jul 23, 2024
40 of 50 checks passed
Copy link

This PR was merged without a "kind/" label, please correct.

@arash77 arash77 deleted the decode-archive-header branch July 23, 2024 17:47
@nsoranzo nsoranzo changed the title Fix Archive header encoding [24.0] Fix Archive header encoding Jul 24, 2024
@arash77 arash77 mentioned this pull request Nov 19, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Archive header should be encoded in latin-1
3 participants