TextCodecDetector::detectCodec() mis-treats invalid UTF-8 as valid #88

SlySven · 2019-04-08T03:14:37Z

UTF-8 today (well since 2003 😏 ) must not contain any code-point byte sequences for Unicode code points beyond those that can be encoded in UTF-16 - which means that the maximum acceptable Unicode code point is U+10FFF - which is within the range that can be conveyed with 4-bytes. This suggests that the code in the listed method can detect something as UTF-8 when it MUST (for RFC values of MUST 😀 ) in fact reject it (as per RFC3629)...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TextCodecDetector::detectCodec() mis-treats invalid UTF-8 as valid #88

TextCodecDetector::detectCodec() mis-treats invalid UTF-8 as valid #88

SlySven commented Apr 8, 2019

TextCodecDetector::detectCodec() mis-treats invalid UTF-8 as valid #88

TextCodecDetector::detectCodec() mis-treats invalid UTF-8 as valid #88

Comments

SlySven commented Apr 8, 2019