-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the extracted content output not contain picture elements #3
Comments
Please include more details, including:
|
I think they just mean that the extracted text doesn't have the original images within the text - is there a way to do this? |
OP hasn't replied in a long time, but if you have an example URL + markup, please attach it here. |
@chimbori is this being worked on? Still not getting any tags |
Not being actively worked on, no. I’ll look into it if/when I have a chance, but the reason I asked for more documentation is that others who see this issue could have enough information to get started. |
If anyone does look at this, the reason it doesn't work is because some sites load some of their images lazily with JavaScript, and the HTML you are providing is likely the one before the images are inserted. To fix this, the JavaScript must first be run, then provide that HTML to Crux - this can be done with something like HtmlUnit, but that library doesn't work on Android. Still trying to find a solution to that, though that might be out of the scope of Crux - with the post-JavaScript HTML, it works fine. |
No description provided.
The text was updated successfully, but these errors were encountered: