node-htmlparser mangles text containing angle brackets #1

mshmoustafa · 2019-05-29T18:20:01Z

What is the problem: node-htmlparser interprets any angle brackets (< and >) as delimiters for HTML tags and will parse the remaining text as HTML.

Proposed fix: process entries only in the Links section through Utility.convertLinksToTextLinks

Further details: HTML parsing is used to turn tags that are used in entries into components (see Utility.convertLinksToTextLinks). node-htmlparser causes problems with strings such as The number of sequences (s(0),s(1),...,s(n)) such that 0<s(i)<5, |s(i)-s(i-1)|=1 and s(0)=1 is F(n+1); e.g., F(5+1) = 8 corresponds to 121212, 121232, 121234, 123212, 123232, 123234, 123432, 123434. - Clark Kimberling, Jun 22 2004 [corrected by Neven Juric, Jan 09 2009] (https://oeis.org/A000045). This problem didn't show up in the Cordova implementation because that used the built in DOM API in Safari, which apparently is more forgiving than node-htmlparser. Luckily (or very likely by design), it seems that hyperlinks are present only in the Links section. For now, the plan is to process entries only in the Links section with Utility.convertLinksToTextLinks. If, after testing, that fixes the problem, then there is no further work needed. However, if that doesn't fix most of the mangling issues, the next step would be either to:

Use a Web View to gain access to Safari's/Chrome's HTML parser and make use of it with a bridge.
Parse the hyperlinks ourselves. Seeing as we now have a LinkText component, we could just use a regex (I know, parsing HTML with regex is taboo, but this is a really small subset of HTML with reasonably well-defined parameters) to find all tags and pull out the href and text.

The text was updated successfully, but these errors were encountered:

- See issue #1 (#1)

mshmoustafa added the bug Something isn't working label May 29, 2019

mshmoustafa added a commit that referenced this issue May 29, 2019

Parse hyperlinks only for entries in the Links section

a050431

- See issue #1 (#1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node-htmlparser mangles text containing angle brackets #1

node-htmlparser mangles text containing angle brackets #1

mshmoustafa commented May 29, 2019

node-htmlparser mangles text containing angle brackets #1

node-htmlparser mangles text containing angle brackets #1

Comments

mshmoustafa commented May 29, 2019