Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix errors with sessions & logging related to invalid UTF-8 #60

Merged
merged 2 commits into from
Sep 13, 2024

Conversation

acoulton
Copy link
Member

In particular, that presenting an invalid UTF-8 string as the User-Agent header was:

  • preventing MysqlSession from creating a session for the request
  • causing StackdriverApplicationLogger to log an empty line instead of a request log (due to json_encode failing).

Going forward all session user_agent values, and all values going through StackdriverApplicationLogger, will have encoded � characters in place of any invalid UTF-8 characters. Additionally, ASCII control characters (null, bell, etc) that are valid UTF8 but non-printable will also be escaped if present in the user-agent string.

Previously, if the user-supplied User-Agent string contained invalid
UTF8 characters this would cause two problems:

* The MysqlSession driver (against MySQL 8 in strict mode) would
  fail to insert the session to the database due to the invalid
  content in the user_agent string. This caused it to throw an exception,
  which would cause session_start to fail with "session ID must be a
  string".
* The request logger would output an empty log line - because JSON-
  encoding the request log entry failed and we were not detecting that.

There was also the theoretical risk that the user input could contain
non-printable control characters - although not directly an issue,
this could cause unexpected behaviour if consuming projects tried to
read the database user_agent or parse the log entries without treating
this as user input.

This fix introduces a new StringSanitiser::ensurePrintableUtf8 method
which will silently re-encode ASCII control characters and invalid
UTF8 sequences as `�` characters to make them safe for use.

That is then used for both the session value and the log value of
the User-Agent header.

The raw User-Agent is still used for hashing the session ID: the hash
function just sees this as a byte sequence so the string encoding is
irrelevant. This ensures that sessions created before this commit
will still have the same hash as ones used after it, so the change is
BC.
Although we now sanitise the user-agent input, there may be other
places where an application could be logging messages or context
including invalid UTF-8 (exception messages, for example). Rather
than lose the whole log entry, have json_encode transliterate these
to the � character so that we can get a safe & valid log line.
@acoulton acoulton merged commit 5e8369b into 2.x Sep 13, 2024
4 checks passed
@acoulton acoulton deleted the 2.1-bug-session-invalid-user-agent branch September 13, 2024 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant