-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trouble with demo.json validation #1
Comments
I've put together a basic Jasmine test spec to demonstrate the issue I'm seeing. Note that this is a fork of the mapbox/mbtiles-spec repo with the demo.json referenced in latest the UTFGrid spec. I couldn't find any other UTFGrid related tests for the client. Let me know if I've missed some - seeing working tests would help figure out what might be going wrong on my side. Thanks. |
One way to deal with it would be to treat the strings as UTF-16 and decode them into an array of Numbers. We would then be able to use the entire Unicode range of 0 - 0x10FFFF (minus invalid JSON) |
Something like this, with saner error handling function utf16ToUnicode (str) {
var utf32 = 0,
isPair = false,
out = [],
len = str.length;
for(var i = 0, code; i < len; i++) {
code = str.charCodeAt(i);
if (!isPair) {
if ((code & 0xFC00) == 0xD800) {
// High surrogate of new pair sequence
utf32 = ((code & 0x3ff) << 10) + 0x10000;
isPair = true;
} else if ((code & 0xFC00) == 0xDC00) {
// Unexpected Low Surrogate
return false;
} else {
// BMP code point, pass straight through
out.push(code);
}
} else {
// When isPair is true, we expect a continuation of a surrogate pair
if ((code & 0xFC00) == 0xDC00) {
// Legal low surrogate
utf32 |= (code & 0x3FF);
out.push(utf32);
} else {
// Incomplete surrogate pair
return false;
}
utf32 = 0;
isPair = false;
}
}
return out;
} Edit: Fixed decoding bug |
I'm trying to write some tests for a browser implementation that use the demo.json described in the spec. I'm seeing trouble once I hit row 215, col 222 - the 55262th id. If I understand right, this should be "encoded" as 55296. I notice that some parsers mention 55296 to 57343 as a range where UTF-16 surrogate pairs cannot be converted to UTF-8.
I'm serving up my tests (with
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
) and demo.json with Apache to Chrome 17 (same behavior on Firefox 10). Thanks for any hints on what might be up. I'm not entirely confident this is UTF-8 through and through.The text was updated successfully, but these errors were encountered: