Encoding issues / Umlaut is not decoded correctly #117

TomRauchenwald38 · 2018-01-02T09:38:00Z

I have trouble decoding the QR code from this PDF (on page 27).
It seems the Umlaut in the last line is not decoded correctly. Screenshot from the live demo:

The last line should read ..."für Gartenarbeit und Entsorgung"...

I can decode the QR Code just fine in Java using ZXing.
If I set the the CHARACTER_SET decoding hint to "ISO-8859-1" the decoded result is exactly the same as pictured in the screenshot, so I suspect that somewhere ISO-8859-1 is assumed in InstaScan.

Here's the QR Code I used for easier copy/pasting:

Is there a way to specify the encoding to use, or is this a bug?

dieperie · 2018-01-18T12:36:22Z

In PHP, use: utf8_decode
Thsi converts the string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1

dieperie · 2018-01-18T20:08:06Z

In javascript, the following to to the same:

var decoded_content = self.utf8_decode(content);
self.scans.unshift({ date: +(Date.now()), content: decoded_content });

utf8_decode: function (str_data) {
// Converts a string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1
var string = "", i = 0, c = c1 = c2 = 0;

	while ( i < str_data.length ) {
		c = str_data.charCodeAt(i);
		if (c < 128) {
			string += String.fromCharCode(c);
			i++;
		} else if((c > 191) && (c < 224)) {
			c2 = str_data.charCodeAt(i+1);
			string += String.fromCharCode(((c & 31) << 6) | (c2 & 63));
			i += 2;
		} else {
			c2 = str_data.charCodeAt(i+1);
			c3 = str_data.charCodeAt(i+2);
			string += String.fromCharCode(((c & 15) << 12) | ((c2 & 63) << 6) | (c3 & 63));
			i += 3;
		}
	}
	return string;

yamnikov-oleg · 2018-02-02T11:11:00Z

Having the same issue. Cyrillics are decoded into gibberish:

Ð�Ð°Ð½Ð½Ñ�Ð¹ ÐºÑ�Ð¿Ð¾Ð½ Ñ�Ð³ÐµÐ½ÐµÑ�Ð¸Ñ�Ð¾Ð²Ð°Ð½

fariskas · 2019-09-06T06:33:10Z

having same issues with korean language

alekciy · 2020-05-09T13:57:05Z

Having the same issue. Cyrillics are decoded into gibberish:
Ð�Ð°Ð½Ð½Ñ�Ð¹ ÐºÑ�Ð¿Ð¾Ð½ Ñ�Ð³ÐµÐ½ÐµÑ�Ð¸Ñ�Ð¾Ð²Ð°Ð½

Проблема с этом куске

instascan/src/scanner.js

Line 101 in b0f9519

let str = String.fromCharCode.apply(null, result);

но я пока еще не разобрался как это пофиксить.

yamnikov-oleg · 2020-05-09T18:06:19Z

@alekciy Thank you for the tip, I have added utf8 decoder in that line and it worked.

yamnikov-oleg · 2020-05-09T18:12:03Z

Though this might not get merged. In case somebody needs this fix, you can clone the repo, apply the fix yourself and rebuild the package with:

npm install
./node_modules/.bin/gulp release

The instascan.min.js will appear in dist directory.

alekciy · 2020-05-10T04:16:12Z

@alekciy Thank you for the tip, I have added utf8 decoder in that line and it worked.

А если cp1251? Например, платежки по ГОСТ Р 56042-2014 формат ST00011. В идеале добавить бы детектор кодировки.

yamnikov-oleg · 2020-05-10T07:38:01Z

@alekciy I don't think there is a reliable way to detect text encoding, especially when it's CP encodings. It would probably be better to add an encoding parameter to the Scanner class.

yamnikov-oleg linked a pull request May 9, 2020 that will close this issue

Decode strings using utf8 #248

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding issues / Umlaut is not decoded correctly #117

Encoding issues / Umlaut is not decoded correctly #117

TomRauchenwald38 commented Jan 2, 2018

dieperie commented Jan 18, 2018

dieperie commented Jan 18, 2018

yamnikov-oleg commented Feb 2, 2018

fariskas commented Sep 6, 2019

alekciy commented May 9, 2020

yamnikov-oleg commented May 9, 2020

yamnikov-oleg commented May 9, 2020

alekciy commented May 10, 2020

yamnikov-oleg commented May 10, 2020

Encoding issues / Umlaut is not decoded correctly #117

Encoding issues / Umlaut is not decoded correctly #117

Comments

TomRauchenwald38 commented Jan 2, 2018

dieperie commented Jan 18, 2018

dieperie commented Jan 18, 2018

var decoded_content = self.utf8_decode(content); self.scans.unshift({ date: +(Date.now()), content: decoded_content });

yamnikov-oleg commented Feb 2, 2018

fariskas commented Sep 6, 2019

alekciy commented May 9, 2020

yamnikov-oleg commented May 9, 2020

yamnikov-oleg commented May 9, 2020

alekciy commented May 10, 2020

yamnikov-oleg commented May 10, 2020

var decoded_content = self.utf8_decode(content);
self.scans.unshift({ date: +(Date.now()), content: decoded_content });