New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Idea] Use web.archive.org as a possible final fallback for images #39

Open

shinji257 opened this issue Oct 24, 2024 · 5 comments

Labels

shinji257 commented Oct 24, 2024 •

edited

Loading

In the event that the API returns the existance of a manga but you still get 404s for images try using web.archive.org as well to get those images. This can happen if the item in question did exist while api was being retrieved but was subsequently deleted so now it only exists in the db. I had recently found (while the service was up) that mangas can exist on the site. You already have the metadata by this point (usually) so I'm thinking that maybe hitting up the site for images may be viable when they stablize things.

I think (from what I understand) you can prefix the full url with https://web.archive.org/web like https://web.archive.org/web/https://i.nhentai.net/galleries/819208/4.jpg and it will grab the most recent copy that the service has but won't be able to test implementation and viability until the service is back up and running again. Apparently it went back down again today.

The actual image url is something like https://web.archive.org/web/{datecode}if_/https://i.nhentai.net/galleries/819208/4.jpg. It seems arbitrarily using this url will redirect to the same place updating the placeholder with the correct data.

https://web.archive.org/web/00000000000000if_/https://i.nhentai.net/galleries/819208/4.jpg

shinji257 changed the title ~~Idea: Use web.archive.org as a possible final fallback for images~~ [Idea] Use web.archive.org as a possible final fallback for images

Author

shinji257 commented Oct 24, 2024

Testing my implementation using id 135474. This id was deleted from the website but still remains in my db from the last pass. Result: success
Log: https://gist.github.com/shinji257/8a7ad40ad18f196edd85ebd3fbf6bf72

shinji257 mentioned this issue

Add web archive as a final fallback [WIP] #40

Open

Owner

9-FS commented Oct 24, 2024

This is a pretty cool idea. In my experience though, the actual images don't get deleted from the media servers. It's only the metadata / gallery information that gets purged, so only that would need to be rerouted to the web archive. This should also greatly reduce the strain on the web archive.

I will put this on the pile of features I want to implement in the future.

9-FS added the enhancement label

Author

shinji257 commented Oct 24, 2024

The code I submitted as a possible PR has it so it only uses the web archive in the event that all media servers fail. I think so anyways and appears to be that way based on the log output above as I can see it cycling through the media servers before it goes there. Anyways thanks for your reply. ;)

Owner

9-FS commented Oct 24, 2024

Btw I would like to express my gratitude for your willingness to contribute and that you're so active on this project. I just have a lot going on currently career wise, but I will come back to this project for all the enhancements that have piled up eventually!

Author

shinji257 commented Oct 26, 2024

So this can work but it doesn't always work. In fact I'm getting a pretty low success rate. Some fails are partial pulls. If this gets added then it should fast fail by stopping after X number of errors that way it doesn't keep spamming IA if it can't get the whole thing from there anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment