Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(adr): ADR no. 11 Handling FQDNs #1184

Merged
merged 5 commits into from
Oct 19, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions doc/adr/0011-handling-fqdns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# 11) Handling FQDNs {#adr_0011}

<!--
Don't forget to update the TOC in index.md when adding a new record
-->

Date: 2023-10-04

## Status

proposed

## Context
(FQDN = Fully qualified domain name)

Wikibase.cloud allows it's users to create wikis with subdomains on `wikibase.cloud` or to use their own domain names. In both cases, the (resulting) FQDN gets stored in MariaDB database as-is. In August 2023 it became apparent that FQDNs with special characters (= non-ASCII), causes troubles in the system [1], one of which being k8s only allowing handling of hostnames according to RFC 1123 [2][3].

## Decision

To circumvent current and future troubles with non-ASCII domain names, from the moment the system receives the name during creation of a wiki, it gets encoded to punycode[4] (an encoding allowing unicode via ascii representation), and gets handled only in that format internally. As soon as the value leaves the internal API, it gets decoded to it's original representation in unicode.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are all the places where the domain names get used? How happy are mediawiki, QS, etc. with non-ascii/punycode FQDNs? ElasticSearch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually remember now testing only using punycode for the k8s ingress, mediawiki wasn't happy about that, I assume similar results for other services. What I'm a bit worried about are the implications for Wikibase and for example results in the Query Service. Can't tell if this would be problematic or be actually the exact right thing to do.


## Consequences

- An ASCII-only representation like punycode should fix and not cause any more troubles with special characters in FQDNs
- Existing values need to be converted in the database

- [1] - https://phabricator.wikimedia.org/T345139
- [2] - https://www.rfc-editor.org/rfc/rfc1123
- [3] - `"message": "Invalid value: \"então.carolinadoran.com\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')"`
- [4] - https://en.wikipedia.org/wiki/Punycode
1 change: 1 addition & 0 deletions doc/adr/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Current ADRs include:
- [7) Application level monitoring using Prometheus](0007-monitoring-prometheus.md)
- [8) Deploying a service mesh to the Kubernetes Cluster](0008-service-mesh.md)
- [9) Terraform style conventions](0009-terraform-style-conventions.md)
- [11) Handling FQDNs](0011-handling-fqdns.md)
<!-- toc-end -->

---
Expand Down