Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document encoding of formatQuery #80

Open
dustin opened this issue Nov 10, 2021 · 6 comments
Open

Document encoding of formatQuery #80

dustin opened this issue Nov 10, 2021 · 6 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@dustin
Copy link

dustin commented Nov 10, 2021

formatQuery returns a ByteString which is meant to be printed out (debugging, logging, etc...) but the character encoding is not specified. I assume it's UTF-8, but it'd be good to clarify that in the docs. (Or alternatively, return Text).

formatQuery :: ToRow q => Connection -> Query -> q -> IO ByteString
@phadej
Copy link
Collaborator

phadej commented Nov 10, 2021

It's not that simple, it relies on what libpq does, e.g. in https://hackage.haskell.org/package/postgresql-libpq-0.9.4.3/docs/Database-PostgreSQL-LibPQ.html#v:escapeStringConn

... and also what the Query literal arguments were (as those are not processed at all).

I.e. I actually have no idea what the result is when you insert non-ascii text field values. How they are escaped, or whether it indeed depends on Connection state / settings.

@dustin
Copy link
Author

dustin commented Nov 10, 2021

OK, this is helpful. It does seem to be related to client encoding, which can be specified on the connection string. It might be useful to mention some of this here. As it is, my configuration appears to work OK with UTF-8, but that's a guess and character encoding guesses can go bad. :)

@phadej phadej added good first issue Good for newcomers help wanted Extra attention is needed labels Nov 10, 2021
@phadej
Copy link
Collaborator

phadej commented Nov 10, 2021

There is already a paragraph saying

This function is exposed to help with debugging and logging. Do not use it to prepare queries for execution.

And the resulting type is ByteString. Perfectly you wouldn't assume anything about it, i.e. your logger can take arbitrary ByteStrings and log them in human readable way when possible and not when they are not.

(EDIT: I often use a decodeUtf8Lenient in situations like this, i.e. utf8 decoding function which doesn't fail on invalid input. You may chose to use something different, it depends. I'd like to avoid giving some general advice, as I really cannot).

@dustin
Copy link
Author

dustin commented Nov 10, 2021

Yeah, that's what I'm doing. I just want to make sure it's even a sensible first guess as well as what might affect it.

This is one of those things that might work well in development but not work at all in production and then wouldn't be noticed until stuff was broken and we needed it. :)

@phadej
Copy link
Collaborator

phadej commented Nov 10, 2021

As I said, I'll avoid giving any general advice here. You should figure out what works in your setup.

@pbrinkmeier
Copy link

@phadej At this point it might be a sensible option to setClientEncoding "UTF8" on every connection.

This causes the Postgres server to convert textual values to UTF-8 even if they are stored as another encoding before sending them to the client, which means that this works even if the server itself uses some different obscure format.

UTF8 is a support character set in Postgres since version 8.1, which was released in 2010 (currently the latest supported version is 12). Which means the chance of breaking stuff by doing this is fairly low.

Arguably the codebase assumes that the server sends UTF-8 already, e.g. in the instance FromField Text, which throws a ConversionError if the given ByteString isn't valid UTF-8.

I am happy to provide a PR including tests!

Doing this would enable a formatQuery' implementation that returns String or Text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants