Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextCodecDetector::detectCodec() mis-treats invalid UTF-8 as valid #88

Open
SlySven opened this issue Apr 8, 2019 · 0 comments
Open

Comments

@SlySven
Copy link
Contributor

SlySven commented Apr 8, 2019

UTF-8 today (well since 2003 😏 ) must not contain any code-point byte sequences for Unicode code points beyond those that can be encoded in UTF-16 - which means that the maximum acceptable Unicode code point is U+10FFF - which is within the range that can be conveyed with 4-bytes. This suggests that the code in the listed method can detect something as UTF-8 when it MUST (for RFC values of MUST 😀 ) in fact reject it (as per RFC3629)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant