Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Idea] Use web.archive.org as a possible final fallback for images #39

Open
shinji257 opened this issue Oct 24, 2024 · 5 comments
Open
Labels
enhancement New feature or request

Comments

@shinji257
Copy link

shinji257 commented Oct 24, 2024

In the event that the API returns the existance of a manga but you still get 404s for images try using web.archive.org as well to get those images. This can happen if the item in question did exist while api was being retrieved but was subsequently deleted so now it only exists in the db. I had recently found (while the service was up) that mangas can exist on the site. You already have the metadata by this point (usually) so I'm thinking that maybe hitting up the site for images may be viable when they stablize things.

I think (from what I understand) you can prefix the full url with https://web.archive.org/web like https://web.archive.org/web/https://i.nhentai.net/galleries/819208/4.jpg and it will grab the most recent copy that the service has but won't be able to test implementation and viability until the service is back up and running again. Apparently it went back down again today.

The actual image url is something like https://web.archive.org/web/{datecode}if_/https://i.nhentai.net/galleries/819208/4.jpg. It seems arbitrarily using this url will redirect to the same place updating the placeholder with the correct data.

https://web.archive.org/web/00000000000000if_/https://i.nhentai.net/galleries/819208/4.jpg

@shinji257 shinji257 changed the title Idea: Use web.archive.org as a possible final fallback for images [Idea] Use web.archive.org as a possible final fallback for images Oct 24, 2024
@shinji257
Copy link
Author

Testing my implementation using id 135474. This id was deleted from the website but still remains in my db from the last pass. Result: success
Log: https://gist.github.com/shinji257/8a7ad40ad18f196edd85ebd3fbf6bf72

@9-FS
Copy link
Owner

9-FS commented Oct 24, 2024

This is a pretty cool idea. In my experience though, the actual images don't get deleted from the media servers. It's only the metadata / gallery information that gets purged, so only that would need to be rerouted to the web archive. This should also greatly reduce the strain on the web archive.

I will put this on the pile of features I want to implement in the future.

@9-FS 9-FS added the enhancement New feature or request label Oct 24, 2024
@shinji257
Copy link
Author

The code I submitted as a possible PR has it so it only uses the web archive in the event that all media servers fail. I think so anyways and appears to be that way based on the log output above as I can see it cycling through the media servers before it goes there. Anyways thanks for your reply. ;)

@9-FS
Copy link
Owner

9-FS commented Oct 24, 2024

Btw I would like to express my gratitude for your willingness to contribute and that you're so active on this project. I just have a lot going on currently career wise, but I will come back to this project for all the enhancements that have piled up eventually!

@shinji257
Copy link
Author

So this can work but it doesn't always work. In fact I'm getting a pretty low success rate. Some fails are partial pulls. If this gets added then it should fast fail by stopping after X number of errors that way it doesn't keep spamming IA if it can't get the whole thing from there anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants