Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix attribute encoding when using Shibboleth #102

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ghalse
Copy link
Contributor

@ghalse ghalse commented Jun 21, 2024

RFC2616 states that HTTP headers are encoded in latin1 (iso-8859-1), and the Python/Django request.META (correctly) assumes that incoming headers will be encoded in this way.

However, by default, Shibboleth ignores the iso-8859-1 restriction and puts the UTF-8 encoded values from SAML into its request headers with ShibUseHeaders without transliteration ref]. This results in incorrectly encoded characters when non-ASCII / accented characters are used in e.g. the first or last name.

There are two ways we could fix this. The approach used here is to simply acknowledge the incorrect encoding and fix it (i.e. force the string to be interpreted as UTF-8 rather than Latin1. This is backwards compatible and will be invisible to any sites that don't already have incorrectly encoded names.

The alternative would be to make use of Shibboleth's ShibRequestSetting encoding URL option in the Apache config to force Shibboleth to URL encode the string. We would then have to decode it when we consumed it. This approach is arguably more correct since the headers would be RFC compliant, but involves much more work and requires users change their webserver config. It's not backwards compatible.

RFC2616 states that HTTP headers are encoded in latin1 (iso-8859-1), and
the Python/Django request.META (correctly) assumes that incoming headers
will be encoded in this way.

However, by default, Shibboleth ignores the iso-8859-1
restriction and puts the UTF-8 encoded values from SAML into
its request headers with ShibUseHeaders without transliteration
[ref](https://shibboleth.atlassian.net/wiki/spaces/SP3/pages/2065334723/ContentSettings)].
This results in incorrectly encoded characters when non-ASCII / accented
characters are used in e.g. the first or last name.

There are two ways we could fix this. The approach used here is to
simply acknowledge the incorrect encoding and fix it (i.e. force
the string to be interpreted as UTF-8 rather than Latin1. This is
backwards compatible and will be invisible to any sites that don't
already have incorrectly encoded names.

The alternative would be to make use of Shibboleth's `ShibRequestSetting
encoding URL` option in the Apache config to force Shibboleth to URL
encode the string. We would then have to decode it when we consumed it.
This approach is arguably more correct since the headers would be RFC
compliant, but involves much more work and requires users change their
webserver config. It's not backwards compatible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant