Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize images hosted on mindtouch files even if they return a "bad" application/octet-stream mime type #108

Open
benoit74 opened this issue Nov 27, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@benoit74
Copy link
Contributor

Sample:

[mindtouch2zim::Thread-5 (worker)::2024-11-26 18:31:57,143] DEBUG:Not optimizing, unsupported mime type: application/octet-stream for asset from https://geo.libretexts.org/@api/deki/files/5324/xenolith-of-diorite.jpg?revision=1 used by page ID 7820 (https://geo.libretexts.org/Bookshelves/Geology/Physical_Geology_(Earle)/08%3A_Measuring_Geological_Time/8.02%3A_Relative_Dating_Methods), page ID 30611 (https://geo.libretexts.org/Courses/Coalinga_College/Introduction_to_Earth_Science_(C-ID%3A_GEOL_121)/02%3A_The_Geosphere/2.27%3A_Measuring_Geological_Time-_Relative_Dating_Methods)

This represent a very high number of assets (e.g. 4135 out of 23357 for geo.libretexts.org), so we need to fix this.

@benoit74 benoit74 added the enhancement New feature or request label Nov 27, 2024
@benoit74 benoit74 added this to the 0.1 milestone Nov 27, 2024
@benoit74 benoit74 self-assigned this Nov 27, 2024
@benoit74
Copy link
Contributor Author

We should probably try to guess mimetype based on URL when mimetype is application/octet-stream:

>>> mimetypes.guess_type(urlsplit("https://geo.libretexts.org/@api/deki/files/5324/xenolith-of-diorite.jpg?revision=1").path)
('image/jpeg', None)

Note that we need the urlsplit to retrieve the path, guess_type does not work well with URL including a query parameter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant