-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Idea] Use web.archive.org as a possible final fallback for images #39
Comments
Testing my implementation using id 135474. This id was deleted from the website but still remains in my db from the last pass. Result: success |
This is a pretty cool idea. In my experience though, the actual images don't get deleted from the media servers. It's only the metadata / gallery information that gets purged, so only that would need to be rerouted to the web archive. This should also greatly reduce the strain on the web archive. I will put this on the pile of features I want to implement in the future. |
The code I submitted as a possible PR has it so it only uses the web archive in the event that all media servers fail. I think so anyways and appears to be that way based on the log output above as I can see it cycling through the media servers before it goes there. Anyways thanks for your reply. ;) |
Btw I would like to express my gratitude for your willingness to contribute and that you're so active on this project. I just have a lot going on currently career wise, but I will come back to this project for all the enhancements that have piled up eventually! |
So this can work but it doesn't always work. In fact I'm getting a pretty low success rate. Some fails are partial pulls. If this gets added then it should fast fail by stopping after X number of errors that way it doesn't keep spamming IA if it can't get the whole thing from there anyways. |
In the event that the API returns the existance of a manga but you still get 404s for images try using web.archive.org as well to get those images. This can happen if the item in question did exist while api was being retrieved but was subsequently deleted so now it only exists in the db. I had recently found (while the service was up) that mangas can exist on the site. You already have the metadata by this point (usually) so I'm thinking that maybe hitting up the site for images may be viable when they stablize things.
I think (from what I understand) you can prefix the full url with https://web.archive.org/web like https://web.archive.org/web/https://i.nhentai.net/galleries/819208/4.jpg and it will grab the most recent copy that the service has but won't be able to test implementation and viability until the service is back up and running again. Apparently it went back down again today.
The actual image url is something like https://web.archive.org/web/{datecode}if_/https://i.nhentai.net/galleries/819208/4.jpg. It seems arbitrarily using this url will redirect to the same place updating the placeholder with the correct data.
https://web.archive.org/web/00000000000000if_/https://i.nhentai.net/galleries/819208/4.jpg
The text was updated successfully, but these errors were encountered: