-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escape method won't match correctly #20
Comments
I've confirmed this. Gumbo Parser directly embeds escaped character sequences without modification, yet gumbo-query converts them before building a selector. Gumbo Parser has a massive portion of source code dedicated to converting character references ( The character references issue can be sorted by leveraging the character reference conversion built into gumbo parser, and the parseEscaped method needs to be dropped entirely.
|
Hi, @TechnikEmpire |
@lazytiger I understand. I'll see if I can help. I'm doing an unofficial fork fixing these things, but also rewriting almost everything, but that's for my own purposes. I'm injecting a bunch of third party dependencies and don't expect it to be pulled from back to here. However, I'll create a separate actual fork, throw in whatever bug fixes I come up with and you can decide if you want to pull them. |
@lazytiger I lost track of most of the fixes I made. I tried to do a clone and push fixes but, like I said just lost track. I've finished my rewrite of gumbo-query and renamed it to GQ so it doesn't shadow your project. I've created a ton of tests that can be run in the browser and test programs which verify correct functionality for all types of selectors (over 40 tests). I've also implemented a tree map that makes matching extremely fast. Can find it here https://github.com/TechnikEmpire/GQ if you feel like backporting any of it. Unfortunately due to the full scale rewrite and implementation of boost deps, it wasn't possible to do an official fork and submit PR's. |
@TechnikEmpire Thanks for the fixes, I'll see to it later. |
So, going over the code for parsing selectors, there's the method
parseEscape(...)
which appears to be intended to convert attribute values that contain escaped characters and code points to literal values. However, doing some quick tests embedding such values in HTML, parsing then matching, it appears thatgumbo_parser
directly embeds these values untouched. So if I'm right, unescaping these sequences in supplied selectors will actually cause the selectors which should match, to not match.I'm not familiar with the GO html package, but from what searching I've done, it appears that this might have been a requirement for
cascadia
(since it worked against the GO html package), but is invalid for use withgumbo_parser
. On that note,gumbo_parser
does appear to convert numeric character references and named character references, butgumbo_query
doesn't of course, so matching a selector with an id containing something like" "
will fail.Will try to post some tests to confirm, I'm just reading over all of this figuring it out.
References:
http://www.w3.org/International/questions/qa-escapes
https://mathiasbynens.be/notes/css-escapes
http://www.w3.org/TR/html5/syntax.html#named-character-references
The text was updated successfully, but these errors were encountered: