You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The consequence is that some images are missing in the ZIM (688 out of ~ 15k, 4%, not negligible).
In local tests with curl, it looks like passing User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:132.0) Gecko/20100101 Firefox/132.0 as header is sufficient to not (immediately?) trigger Cloudfront protections.
The text was updated successfully, but these errors were encountered:
When trying to download images from flexbooks.ck12.org, the scraper is denied access, due to a Cloudfront WAF.
E.g. https://flexbooks.ck12.org/flx/show/THUMB_POSTCARD/image/user%3AY2sxMnNjaWVuY2VAY2sxMi5vcmc./98045-1359163835-22-2-IntPhysC-05-03-Weather-satellite.jpg redirects to https://dr282zn36sxxg.cloudfront.net/datastreams/f-d%3A0e28b5bb5ad0f030c1a8be7f2a189afc410f6a7e4f7ddd541706304e%2BIMAGE_THUMB_POSTCARD_TINY%2BIMAGE_THUMB_POSTCARD_TINY.1
The consequence is that some images are missing in the ZIM (688 out of ~ 15k, 4%, not negligible).
In local tests with curl, it looks like passing
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:132.0) Gecko/20100101 Firefox/132.0
as header is sufficient to not (immediately?) trigger Cloudfront protections.The text was updated successfully, but these errors were encountered: