You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue covers the "second part" of #177 where it has already been discussed.
The issue concerns resources which have multiple encoding inside (shouldn't exist ... but we are quite sure it does, even if Zimit2 test websites never had this issue).
transfer the raw content as-is to the ZIM (without any rewriting)
do our best to decode / rewrite as much as possible
If I'm not mistaken, @kelson42 has clearly indicated that only option 3 is acceptable from his PoV while @mgautierfr is more in favor of option 1.
I tend to prefer option 3 but consider this is not the highest priority issue we have on Zimit2, especially since we do not encountered the problem in test recipes.
The text was updated successfully, but these errors were encountered:
benoit74
changed the title
Do not crash when a rewriten resource (HTML/CSS/JS) has multiple encoding inside
Zimit2: do not crash when a rewriten resource (HTML/CSS/JS) has multiple encoding inside
Feb 14, 2024
So far the scraper is not crashing anymore when there is multiple encoding in a single file, especially since #314
We are already close to option 3, only bad characters (in another encoding than the rest of the document) are "replaced" by "something".
I will hence close the issue, we have no track on how to handle this situation better than today, and there is nothing really annoying today. Current experience with warc2zim on https://tmp.kiwix.org/ci/test-website/bad-encoding.html is identical to the one on most browsers.
This issue covers the "second part" of #177 where it has already been discussed.
The issue concerns resources which have multiple encoding inside (shouldn't exist ... but we are quite sure it does, even if Zimit2 test websites never had this issue).
A handcrafted" sample is a page like https://tmp.kiwix.org/ci/test-website/bad-encoding.html where:
We could decide to :
If I'm not mistaken, @kelson42 has clearly indicated that only option 3 is acceptable from his PoV while @mgautierfr is more in favor of option 1.
I tend to prefer option 3 but consider this is not the highest priority issue we have on Zimit2, especially since we do not encountered the problem in test recipes.
The text was updated successfully, but these errors were encountered: