Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem parsing angle brackets #200

Open
Sohex opened this issue Jul 12, 2021 · 4 comments
Open

Problem parsing angle brackets #200

Sohex opened this issue Jul 12, 2021 · 4 comments

Comments

@Sohex
Copy link

Sohex commented Jul 12, 2021

Describe the bug
Text appearing in angle brackets is removed from novels. This issue effects multiple novels. For example Reincarnator Chapter 57:
Text should read - "That was needed to create the <hardening fluid> which 127 Alchemists..."
Text in downloaded epub appears as - "That was needed to create the which 127 Alchemists..."
The same issue appears multiple times in that chapter alone, but it also occurs frequently in other novels.

@Sohex
Copy link
Author

Sohex commented Jul 12, 2021

Looking at your Wuxiaworld-2-eBook repo (which I'm assuming novel-ebook is based on), the issue appears to be with either the source for the text or the initial ingest. If properly escaped brackets (e.g. &lt; &gt;) are passed into beautiful soup everything should work correctly. So if the source is showing those correctly then some part of your flow is actually rendering out &lt; &gt; as <> prior to that being passed to beautiful soup.

@Sohex
Copy link
Author

Sohex commented Jul 12, 2021

The issue is already present in the text when it's fetched from data.novel-ebook.com so the issue should definitely be with however you're populating that. I'll stop poking around and leave it to you, thanks!

@MichaelGuardian
Copy link
Contributor

Thanks for the in-depth bug report!

Just in case your're interested, the project's backend has been completely rewritten in javascript and rust since Python really ate on my cpu resources.

I took a quick glance at the code and the data source, it seems that the data source does not properly escape angle brackets. I take a closer look on possible fixes for this issue once I'm back from vacation.

@Emasoft
Copy link

Emasoft commented Nov 4, 2021

It's not only angle brackets. Lots of escape codes are not parsed correctly as characters, making the text hard to read.
This is Versatile Mage, chapter 1766, downloaded today:
Mo Fan&#x2019;s Demon Element. It was dark and wet around him. The bottom of the cave was very spacious, and Mo Fan&#x2019;s vision in the dark was not too bad. He was able to see the rough appearance of the cave. Cries of agony came from above him, followed by the sound of something slamming heavily to the ground. &#x201C;That hurts&#x2026; my ass!&#x201D; A dark figure rose to his feet. His eyes were glittering faintly in the dark. &#x201C;Zhou Donghao?&#x201D; Mo Fan recognized the idiot&#x2019;s voice. &#x201C;It&#x2019;s me! So you think you can come down here, but I can&#x2019;t? You think only you care about Tao Jing? I care about her more than you!&#x201D; Zhou Donghao said righteously. &#x201C;Are you retarded? Can you please lower your voice in a place like this? Or do you want to be smashed into pulp by the Rock Monsters?&#x201D; Mo Fan snarled. &#x201C;I&#x2026;&#x201D; Zhou Donghao was lost for words.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants