Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

German letters (ä,ö,ü) not displayed correctly #46

Closed
ghost opened this issue Nov 7, 2019 · 18 comments
Closed

German letters (ä,ö,ü) not displayed correctly #46

ghost opened this issue Nov 7, 2019 · 18 comments

Comments

@ghost
Copy link

ghost commented Nov 7, 2019

These letters are displayed as '�', queried from a IBM DB2 database with ccsid1141 encoding.

@markdirish
Copy link
Contributor

Could you clarify a little? What part of the system is in CCSID 1141? The entire system, the database table, or the column?

Perhaps you could send the results of a DSPFFD and DSPFD for the table/column in question.

@mamorneau
Copy link

Hi Mark,

I'm having the exact same problem. I'm trying to make a simple node based on odbc.js. It works but I can't query any database, table or field that contains an accented character (like é, è, à, ô, etc). Any foreign character is converted to garbage when the query string is passed to the script or when the script returns a string.
screenshot

Thanks for your help

@markdirish
Copy link
Contributor

There are a lot of different technologies that the data flows through to call and then receive data from the database. I think the most interoperable solution for getting data out is to make sure your connection string includes the CCSID=1208 keyword, or your DSN contains CCSID = 1208. This will set the CCSID for the calls to UTF-8, which will be able to decode the special characters. Otherwise, Node.js defaults to an extended table that doesn't necessarily have those characters or doesn't map well from the EBCDIC tables.

@ghost
Copy link
Author

ghost commented Nov 10, 2019

There are a lot of different technologies that the data flows through to call and then receive data from the database. I think the most interoperable solution for getting data out is to make sure your connection string includes the CCSID=1208 keyword, or your DSN contains CCSID = 1208. This will set the CCSID for the calls to UTF-8, which will be able to decode the special characters. Otherwise, Node.js defaults to an extended table that doesn't necessarily have those characters or doesn't map well from the EBCDIC tables.

Setting the CCSID to 1208 solved my problem, thank you very much!

@mamorneau
Copy link

Hi Mark,

I'm afraid I still your help. In my case CCSID=1208 didn't worked. I tried it in the connection string and also in ODBC.INI and ODBCINIST.INI without success.

Please note that our database is a Pervasive (now Actian) PSQL database. We tried CCSID=1252 since this is how the database is configured (see below)

Sans titre

We also tried to change PvTranslate to auto in the connection string without success (https://docs.actian.com/psql/psqlv13/index.html#page/odbc%2Fodbcadm.htm%23ww1224120).

We also tried to change the encoding translation mode in the connection attributes.

Sans titre2

So far all we manage to do is getting + signs instead of � where there should be 'é'

ODOBCINIT.INI:

[Pervasive ODBC Interface (32 bit)]
Driver=C:\Program Files\Pervasive Software\PSQL\bin\odbcci64.dll
Setup=C:\Program Files\Pervasive Software\PSQL\bin\odbccs64.dll
CCSID=1252

ODBC.INI

[DEVTABLES_ODBC]
Driver=C:\Program Files\Pervasive Software\PSQL\bin\odbcci64.dll
Driver32=C:\Program Files\Pervasive Software\PSQL\bin\odbcci64.dll
CCSID=1252

Thanks

@markdirish
Copy link
Contributor

@mamorneau

The CCSID keyword is unique to the IBM i ODBC connector, so it would make sense that it wouldn't work for your Actian PSQL ODBC driver. So get rid of that line in your ODBCINIT.INI and ODBC.INI, it won't help you.

I think this Actian ODBC guide might help you out, especially starting at page 24: https://communities.actian.com/servlet/fileField?entityId=ka3f3000000PT3QAAW&field=Attachment1__Body__s

I don't have the database set up yet so I can't test myself, but I would think that Windows-1252 would work, since it contains code points for those characters that are coming as �.

Let me know if the PDF doesn't give you clues and help point you in the right direction, and I can maybe try to download and test myself to find the right combination

@mamorneau
Copy link

Hi Mark,

Our connection string was not right: DSN=DEVTABLES_ODBC;DATABASE=DEVTABLES
DATABASE is not even a parameter for the PSQL driver. I guess it worked because the driver was scanning ALL the tables instead of just DEVTABLES. We have read the manual and tried the following connection string:

DSN=DEVTABLES_ODBC;ServerName=data02.1583;PvTranslate=auto;ArrayFetchOn=1;ArrayBufferSize=8;TransportHint=TCP:SPX;DBQ=DEVTABLES;ClientVersion=11.31.070.000;CodePageConvert=1252;PvClientEncoding=CP1252;PvServerEncoding=CP1252;AutoDoubleQuote=0;

This string in fact came from MS Excel. It's the string Excel generate when it connects to the database using ODBC. There is a few odd things with this however. The parameters CodePageConvert, PvClientEncoding and PvServerEncoding are not mentionned in the Pervasive manual, even in the most recent version. I'm not sure why Excel adds them. This string works in Node Red and return results but again, the accented characters are converted to � which is killing me since Excel doesn't have any issue reading them.

There is this line in the manual:

Note When using the PSQL Client driver, Unicode SQL text is always converted to the client encoding by the Microsoft Driver Manager. This restricts NCHAR literals in a SQL query string to the character set of the client. To preserve NCHAR literals in SQL query text, use the PSQL Unicode driver.

Not sure that we understand this correctly but if the Microsoft Driver manager detects that the app is using EBCDIC, it will try to convert to EBCDIC??? We started to look for the PSQL Unicode driver. We noted that the v11 manual (which we use) barely mention Unicode at all. However Unicode is all over the place in the v12 manual.

We will try to install v12 Unicode driver and see how it goes.

@mamorneau
Copy link

Hi Mark,

Interestingly, Excel generate this connection String:

DSN=DEVTABLES_ODBC;ServerName=data02.1583;PvTranslate=auto;ArrayFetchOn=1;ArrayBufferSize=8;TransportHint=TCP:SPX;DBQ=DEVTABLES;ClientVersion=11.31.070.000;CodePageConvert=1252;PvClientEncoding=CP1252;PvServerEncoding=CP1252;AutoDoubleQuote=0;

However, those parameters : ClientVersion=11.31.070.000;CodePageConvert=1252;PvClientEncoding=CP1252;PvServerEncoding=CP1252;AutoDoubleQuote=0;
seems to be there just in case the driver would need them. They are not actually supported by the PSQL odbc driver. DSN=DEVTABLES_ODBC;ServerName=data02.1583;DBQ=DEVTABLES works equally well (including displaying accented characters). The only parameter that seemed promising was PvTranslate. It's the only one (at least in the connection string) that could have helped us but in this case it's useless since it's simply override the DSN configuration which is already set to auto. The only other option is OEM/ANSI conversion but it converts é to +.....

We tried the Pervasive Unicode Client Interface without success (v12, 32 bits and 64 bits). Note that the database uses v11. We can install the client and it can see the databases but we cannot test the connection in the windows odbc utility and any attempt to query the database results in a crash. This may be because the databse itself is v11 (not the unicode version). We just don't know....

So long story short we're stuck.

By the way, is there any chance the issue could be caused by my js Node Red script (which uses odbc.js), like if it would not pass the string properly? I'm asking just in case...

Regards,

@mamorneau
Copy link

Hi Mark,

For reference, please note that we solved this today. It was fairly simple after all. Our database uses CP1252. Same thing for the client and the server. We were puzzled for quite a while since to us there was no reason for this not to work out of the box. Moreover, "é" is the same hex code in quite a lot of code pages, including extended ASCII. However, when trying an SQL INSERT INTO, we noticed that the resulting string in the database was obviously UTF8 hex values. I guess this comes from the fact HTML5 and thus probably Node Red too use UTF8 by default. There is probably a way to force everything to CP1252 in the HTML code but is was easier to set the client locale to use UTF8 and then set the driver to automatic translation. The driver now recognizes that the client is UTF8 while the database is CP1252 and does the translation accordingly.

Et voilà!

Cheers

@yuguifly
Copy link

Hello mamorneau, I would like to ask you a question, we have the same problema that you but the letter “ñ”, we have Pervasive database and I openend a issue whit this problema #28, I tried to do some tests but didn't worked. Could you told us what you you did to fixd the problem?
Thank you so much!

@mamorneau
Copy link

Hi yuguifly. The Pervasive ODBC driver is quite limited in the fact that it seems impossible to force a specific translation. The translation mode has to be set to "automatic" but in this case the driver "correctly" detects that the database is using CP1252 and that the client (a node red server running on a pc in our case) is also using CP1252. However, Node Red uses UTF8 so all the accented characters got corrupted. We tried specifying the charset in the html portion of the node which didn't worked. We also tried to manually convert from CP1252 to UTF8 (and vice versa) within the javascript portion of the node using iconv-lite. This could have worked but since a query to the database returns a javascript object it has to be stringify prior to encoding and it seems that there is no way to stringify directly to CP1252 so the accented character got corrupted anyway. In the end, we simply change the windows setting to the beta UTF8 mode. Now the driver detects that the client is not CP1252 and does the translation properly. This also means that our Node Red server can only be use for this purpose. Anything else (especially VBA stuff and our main database GUI) wouldn’t work on this machine because of the UTF8 mode. We also tried the Unicode Pervasive driver which didn't work most likely because the database is not Unicode and converting it was not really an option for us. I'm sure there is a neater way to do this but this is how we fixed our issue. Cheers!
dv0tU

@kadler
Copy link
Member

kadler commented Apr 24, 2020

The Node API only provides functions to create String objects from UTF-8 bytes or UTF-16 bytes, thus node-odbc will always assume the driver returns UTF-8 bytes. On Windows, this is fairly unlikely as almost nobody runs with this beta UTF-8 locale setting.

I suspect we'll need to build in UNICODE mode on Windows to force all the wide APIs. This ensures that we have a consistent UTF-16 encoding for pulling data into or out of Node.js.

@yuguifly
Copy link

Hello, then are you having the problem when you have received the data, aren't you? Because I have the problem with the query, on the selec, I thought the problem was with the query, I don't know if this will fix my problem.
thank you

@mamorneau
Copy link

Hi yuguifly.

We had the issue with both sending and receiving. If the query string contained accented characters the query would simply fail just like in your case. If the query string had no accented characters then we were receiving a reply from the DB but the accented characters were corrupted. The Windows UTF8 mode of course solved both issues at the same time. That being said I think sending is not as much an issue since the query string is not a javascript object so it should be possible to encode it from UTF8 to CP1252 within the javascript code using iconv before it is sent to the ODBC driver. If the data you are trying to read doesn't contain accented characters (and thus only the query string is causing problem) it could be worth a try.

@yuguifly
Copy link

Hello, I tried to do that, markdirish told me that but this didn’t worked, I don’t know if i did this well, I put this on the pervasive conennection "DSN= ; encoding=Cp1252; UID=; PWD=;" and I changed on de odbc encoding translation to outomatic too, but doesn’t work. I tried another encoding with the same result.
Also I change the Windows configuration as you have, but it was worse, no query worked I had a error with the handler.

thank you

@mamorneau
Copy link

Hi yuguifly,

I think "encoding=CP1252" is specific to other types of databases and is not supported by the pervasive driver. We are pretty much stuck with the translation modes "none", "oem/ansi" and automatic. I have not played much with the oem/ansi mode. This comes from the manual and is quite interesting:

OEM/ANSI Conversion
This setting allows applications to store or retrieve character data in any OEM character set in the PSQL engine, while allowing the data to be manipulated and displayed using the ANSI Windows character set in the application.
The PSQL ODBC driver translation DLL can perform all necessary translations between the two character sets. This feature can be turned on or off for each DSN. Any character data that is passed to and from the database is correctly translated by the ODBC driver between the OEM and ANSI character sets.
If your application connects to the data source using SQLDriverConnect, you can also specify the translation DLL using the connection string option TRANSLATIONDLL=path_and_DLL_name. The translation DLL name for PSQL is W32BTXLT.DLL.
Note The OEM-to-ANSI translation option is available only for Client and 64-bit DSNs.
(You can also use this translate option with a local Engine DSN. It is not available when setting up a remote Client connection to an Engine DSN. Keep in mind that Engine DSNs are deprecated and should not be used for new applications.)

SQLDriverConnect uses a DSN-less connection:
https://docs.actian.com/psql/psqlv13/index.html#page/prog_gde/Quikstrt.htm

This implies that the windows UTF8 mode is not absolutely required to make this work. I think we tried this initially but failed because at that moment we were using a 32bits DSN. FYI this is our connection string. Works fine with the UTF8 mode.

DSN=TABLES_ODBC64;ServerName=data02.1583;DBQ=TABLES

Regards

@Ted-CAcert
Copy link

Ted-CAcert commented Nov 15, 2023

FYI, I had the same problem with a SAP SQL Anywhere database on windows (Client and Server). Adding "CharSet=UTF-8" to the connection string did fix the issue.

Update: I'm not sure if I made a typo with "UTF-8", but recently I used "CharSet=utf8" without the dash with some success. So maybe trying some variants can help...

@jmbluethner
Copy link

Having the same issue on [email protected] against a Progress Database. Also see #379

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants