From 0295f5a6e81a1ddab037af7809a87ea2b55378f6 Mon Sep 17 00:00:00 2001 From: Deniz Erdogan Date: Wed, 4 Oct 2023 19:48:03 +0200 Subject: [PATCH 1/5] feat(adr): Handling FQDNs --- doc/adr/0011-handling-fqdns.md | 30 ++++++++++++++++++++++++++++++ doc/adr/index.md | 1 + 2 files changed, 31 insertions(+) create mode 100644 doc/adr/0011-handling-fqdns.md diff --git a/doc/adr/0011-handling-fqdns.md b/doc/adr/0011-handling-fqdns.md new file mode 100644 index 000000000..9ddb9dab8 --- /dev/null +++ b/doc/adr/0011-handling-fqdns.md @@ -0,0 +1,30 @@ +# 11) Handling FQDNs {#adr_0011} + + + +Date: 2023-10-04 + +## Status + +proposed + +## Context +(FQDN = Fully qualified domain name) + +Wikibase.cloud allows it's users to create wikis with subdomains on `wikibase.cloud` or to use their own domain names. In both cases, the (resulting) FQDN gets stored in MariaDB database as-is. In August 2023 it became apparent that FQDNs with special characters (= non-ASCII), causes troubles in the system [1], one of which being k8s only allowing handling of hostnames according to RFC 1123 [2][3]. + +## Decision + +To circumvent current and future troubles with non-ASCII domain names, from the moment the system receives the name during creation of a wiki, it gets encoded to punycode[4] (an encoding allowing unicode via ascii representation), and gets handled only in that format internally. As soon as the value leaves the internal API, it gets decoded to it's original representation in unicode. + +## Consequences + +- An ASCII-only representation like punycode should fix and not cause any more troubles with special characters in FQDNs +- Existing values need to be converted in the database + +- [1] - https://phabricator.wikimedia.org/T345139 +- [2] - https://www.rfc-editor.org/rfc/rfc1123 +- [3] - `"message": "Invalid value: \"então.carolinadoran.com\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')"` +- [4] - https://en.wikipedia.org/wiki/Punycode diff --git a/doc/adr/index.md b/doc/adr/index.md index 1d6bf0a43..9d3822d59 100644 --- a/doc/adr/index.md +++ b/doc/adr/index.md @@ -21,6 +21,7 @@ Current ADRs include: - [7) Application level monitoring using Prometheus](0007-monitoring-prometheus.md) - [8) Deploying a service mesh to the Kubernetes Cluster](0008-service-mesh.md) - [9) Terraform style conventions](0009-terraform-style-conventions.md) +- [11) Handling FQDNs](0011-handling-fqdns.md) --- From c20cd389022db43ce7e602a872f215341a7d0c28 Mon Sep 17 00:00:00 2001 From: Thomas Arrow Date: Wed, 11 Oct 2023 14:26:29 +0100 Subject: [PATCH 2/5] Update 0011-handling-fqdns.md --- doc/adr/0011-handling-fqdns.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/adr/0011-handling-fqdns.md b/doc/adr/0011-handling-fqdns.md index 9ddb9dab8..1c9d3fc4a 100644 --- a/doc/adr/0011-handling-fqdns.md +++ b/doc/adr/0011-handling-fqdns.md @@ -17,7 +17,7 @@ Wikibase.cloud allows it's users to create wikis with subdomains on `wikibase.cl ## Decision -To circumvent current and future troubles with non-ASCII domain names, from the moment the system receives the name during creation of a wiki, it gets encoded to punycode[4] (an encoding allowing unicode via ascii representation), and gets handled only in that format internally. As soon as the value leaves the internal API, it gets decoded to it's original representation in unicode. +To circumvent current and future troubles with non-ASCII domain names, from the moment the system receives the name during creation of a wiki, it gets encoded to punycode[4] (an encoding allowing unicode via ascii representation), and gets handled only in that format internally; the platform api will also output this format; in this case the consumer must decide how to format it correctly e.g. to decode it back to unicode if desired. ## Consequences From 9d131ff58223e0d27853d46e20dfd1f1b9769c75 Mon Sep 17 00:00:00 2001 From: Deniz Erdogan <91744937+deer-wmde@users.noreply.github.com> Date: Thu, 12 Oct 2023 17:50:09 +0200 Subject: [PATCH 3/5] Update 0011-handling-fqdns.md --- doc/adr/0011-handling-fqdns.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/adr/0011-handling-fqdns.md b/doc/adr/0011-handling-fqdns.md index 1c9d3fc4a..98da75b65 100644 --- a/doc/adr/0011-handling-fqdns.md +++ b/doc/adr/0011-handling-fqdns.md @@ -8,7 +8,7 @@ Date: 2023-10-04 ## Status -proposed +accepted ## Context (FQDN = Fully qualified domain name) @@ -17,7 +17,7 @@ Wikibase.cloud allows it's users to create wikis with subdomains on `wikibase.cl ## Decision -To circumvent current and future troubles with non-ASCII domain names, from the moment the system receives the name during creation of a wiki, it gets encoded to punycode[4] (an encoding allowing unicode via ascii representation), and gets handled only in that format internally; the platform api will also output this format; in this case the consumer must decide how to format it correctly e.g. to decode it back to unicode if desired. +To circumvent current and future troubles with non-ASCII domain names, from the moment the system receives the name during creation of a wiki, it gets encoded to the "Internationalized domain name" (IDN) format[4] (an encoding allowing Unicode via ASCII representation), and gets handled only in that format internally; the platform api will also output this format; in this case the consumer must decide how to format it correctly e.g. to decode it back to unicode if desired. ## Consequences @@ -27,4 +27,4 @@ To circumvent current and future troubles with non-ASCII domain names, from the - [1] - https://phabricator.wikimedia.org/T345139 - [2] - https://www.rfc-editor.org/rfc/rfc1123 - [3] - `"message": "Invalid value: \"então.carolinadoran.com\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')"` -- [4] - https://en.wikipedia.org/wiki/Punycode +- [4] - https://en.wikipedia.org/wiki/Internationalized_domain_name From 4b7f5b3071b7aae7d815cf0c6d4d3d9bc0b5649d Mon Sep 17 00:00:00 2001 From: Deniz Erdogan <91744937+deer-wmde@users.noreply.github.com> Date: Thu, 12 Oct 2023 20:19:50 +0200 Subject: [PATCH 4/5] Update 0011-handling-fqdns.md --- doc/adr/0011-handling-fqdns.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/adr/0011-handling-fqdns.md b/doc/adr/0011-handling-fqdns.md index 98da75b65..d5f36134f 100644 --- a/doc/adr/0011-handling-fqdns.md +++ b/doc/adr/0011-handling-fqdns.md @@ -17,12 +17,13 @@ Wikibase.cloud allows it's users to create wikis with subdomains on `wikibase.cl ## Decision -To circumvent current and future troubles with non-ASCII domain names, from the moment the system receives the name during creation of a wiki, it gets encoded to the "Internationalized domain name" (IDN) format[4] (an encoding allowing Unicode via ASCII representation), and gets handled only in that format internally; the platform api will also output this format; in this case the consumer must decide how to format it correctly e.g. to decode it back to unicode if desired. +To circumvent current and future troubles with non-ASCII domain names, from the moment the system receives the name during creation of a wiki, it gets encoded to the "Internationalized domain name" (IDN) format[4] (an encoding allowing Unicode via ASCII representation), and gets handled only in that format internally; the platform api will also output this format, alongside a decoded variant in Unicode representation. ## Consequences - An ASCII-only representation like punycode should fix and not cause any more troubles with special characters in FQDNs - Existing values need to be converted in the database +- Endpoint implementations in the Platform API need to be careful about providing the right value - in the best case they provide both - [1] - https://phabricator.wikimedia.org/T345139 - [2] - https://www.rfc-editor.org/rfc/rfc1123 From 14ba124b8c1adfa204c045a7d1605122f39ecd6f Mon Sep 17 00:00:00 2001 From: Deniz Erdogan <91744937+deer-wmde@users.noreply.github.com> Date: Thu, 12 Oct 2023 20:20:51 +0200 Subject: [PATCH 5/5] Update 0011-handling-fqdns.md --- doc/adr/0011-handling-fqdns.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/adr/0011-handling-fqdns.md b/doc/adr/0011-handling-fqdns.md index d5f36134f..fe567c12c 100644 --- a/doc/adr/0011-handling-fqdns.md +++ b/doc/adr/0011-handling-fqdns.md @@ -23,7 +23,7 @@ To circumvent current and future troubles with non-ASCII domain names, from the - An ASCII-only representation like punycode should fix and not cause any more troubles with special characters in FQDNs - Existing values need to be converted in the database -- Endpoint implementations in the Platform API need to be careful about providing the right value - in the best case they provide both +- Endpoint implementations in the Platform API need to be careful about actually providing both formats - [1] - https://phabricator.wikimedia.org/T345139 - [2] - https://www.rfc-editor.org/rfc/rfc1123