From 1e7bb0e4626c2ebb4b3032b8449dc39464d589b7 Mon Sep 17 00:00:00 2001 From: Alexandra Konrad <10500694+trisch-me@users.noreply.github.com> Date: Tue, 20 Feb 2024 13:39:26 +0100 Subject: [PATCH] add remaining url fields to the registry (#496) Co-authored-by: Joao Grassi <5938087+joaopgrassi@users.noreply.github.com> --- .chloggen/url-ecs.yaml | 22 +++++++++ docs/attributes-registry/url.md | 28 +++++++++-- model/registry/url.yaml | 86 +++++++++++++++++++++++++++++---- 3 files changed, 122 insertions(+), 14 deletions(-) create mode 100755 .chloggen/url-ecs.yaml diff --git a/.chloggen/url-ecs.yaml b/.chloggen/url-ecs.yaml new file mode 100755 index 0000000000..1c61cb40c3 --- /dev/null +++ b/.chloggen/url-ecs.yaml @@ -0,0 +1,22 @@ +# Use this changelog template to create an entry for release notes. +# +# If your change doesn't affect end users you should instead start +# your pull request title with [chore] or use the "Skip Changelog" label. + +# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' +change_type: enhancement + +# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db) +component: url + +# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). +note: Add remaining ECS fields to the url namespace + +# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists. +# The values here must be integers. +issues: [496] + +# (Optional) One or more lines of additional information to render under the primary note. +# These lines will be padded with 2 spaces and then inserted directly into the document. +# Use pipe (|) for multiline entries. +subtext: diff --git a/docs/attributes-registry/url.md b/docs/attributes-registry/url.md index c7a22357b7..8c161e6648 100644 --- a/docs/attributes-registry/url.md +++ b/docs/attributes-registry/url.md @@ -9,15 +9,35 @@ linkTitle: URL | Attribute | Type | Description | Examples | |---|---|---|---| +| `url.domain` | string | Domain extracted from the `url.full`, such as "opentelemetry.io". [1] | `www.foo.bar`; `opentelemetry.io`; `3.12.167.2`; `[1080:0:0:0:8:800:200C:417A]` | +| `url.extension` | string | The file extension extracted from the `url.full`, excluding the leading dot. [2] | `png`; `gz` | | `url.fragment` | string | ![Stable](https://img.shields.io/badge/-stable-lightgreen)
The [URI fragment](https://www.rfc-editor.org/rfc/rfc3986#section-3.5) component | `SemConv` | -| `url.full` | string | ![Stable](https://img.shields.io/badge/-stable-lightgreen)
Absolute URL describing a network resource according to [RFC3986](https://www.rfc-editor.org/rfc/rfc3986) [1] | `https://www.foo.bar/search?q=OpenTelemetry#SemConv`; `//localhost` | +| `url.full` | string | ![Stable](https://img.shields.io/badge/-stable-lightgreen)
Absolute URL describing a network resource according to [RFC3986](https://www.rfc-editor.org/rfc/rfc3986) [3] | `https://www.foo.bar/search?q=OpenTelemetry#SemConv`; `//localhost` | +| `url.original` | string | Unmodified original URL as seen in the event source. [4] | `https://www.foo.bar/search?q=OpenTelemetry#SemConv`; `search?q=OpenTelemetry` | | `url.path` | string | ![Stable](https://img.shields.io/badge/-stable-lightgreen)
The [URI path](https://www.rfc-editor.org/rfc/rfc3986#section-3.3) component | `/search` | -| `url.query` | string | ![Stable](https://img.shields.io/badge/-stable-lightgreen)
The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [2] | `q=OpenTelemetry` | +| `url.port` | int | Port extracted from the `url.full` | `443` | +| `url.query` | string | ![Stable](https://img.shields.io/badge/-stable-lightgreen)
The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [5] | `q=OpenTelemetry` | +| `url.registered_domain` | string | The highest registered url domain, stripped of the subdomain. [6] | `example.com`; `foo.co.uk` | | `url.scheme` | string | ![Stable](https://img.shields.io/badge/-stable-lightgreen)
The [URI scheme](https://www.rfc-editor.org/rfc/rfc3986#section-3.1) component identifying the used protocol. | `https`; `ftp`; `telnet` | +| `url.subdomain` | string | The subdomain portion of a fully qualified domain name includes all of the names except the host name under the registered_domain. In a partially qualified domain, or if the the qualification level of the full name cannot be determined, subdomain contains all of the names below the registered domain. [7] | `east`; `sub2.sub1` | +| `url.top_level_domain` | string | The effective top level domain (eTLD), also known as the domain suffix, is the last part of the domain name. For example, the top level domain for example.com is `com`. [8] | `com`; `co.uk` | -**[1]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. +**[1]:** In some cases a URL may refer to an IP and/or port directly, without a domain name. In this case, the IP address would go to the domain field. If the URL contains a [literal IPv6 address](https://www.rfc-editor.org/rfc/rfc2732#section-2) enclosed by `[` and `]`, the `[` and `]` characters should also be captured in the domain field. + +**[2]:** The file extension is only set if it exists, as not every url has a file extension. When the file name has multiple extensions `example.tar.gz`, only the last one should be captured `gz`, not `tar.gz`. + +**[3]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. `url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. `url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed) and SHOULD NOT be validated or modified except for sanitizing purposes. -**[2]:** Sensitive content provided in query string SHOULD be scrubbed when instrumentations can identify it. +**[4]:** In network monitoring, the observed URL may be a full URL, whereas in access logs, the URL is often just represented as a path. This field is meant to represent the URL as it was observed, complete or not. +`url.original` might contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case password and username SHOULD NOT be redacted and attribute's value SHOULD remain the same. + +**[5]:** Sensitive content provided in query string SHOULD be scrubbed when instrumentations can identify it. + +**[6]:** This value can be determined precisely with the [public suffix list](http://publicsuffix.org). For example, the registered domain for "foo.example.com" is "example.com". Trying to approximate this by simply taking the last two labels will not work well for TLDs such as "co.uk". + +**[7]:** The subdomain portion of "www.east.mydomain.co.uk" is "east". If the domain has multiple levels of subdomain, such as "sub2.sub1.example.com", the subdomain field should contain "sub2.sub1", with no trailing period. + +**[8]:** This value can be determined precisely with the [public suffix list](http://publicsuffix.org). \ No newline at end of file diff --git a/model/registry/url.yaml b/model/registry/url.yaml index 985ca9123f..65dc2602bc 100644 --- a/model/registry/url.yaml +++ b/model/registry/url.yaml @@ -4,11 +4,30 @@ groups: type: attribute_group prefix: url attributes: - - id: scheme + - id: domain + type: string + brief: > + Domain extracted from the `url.full`, such as "opentelemetry.io". + note: > + In some cases a URL may refer to an IP and/or port directly, + without a domain name. In this case, the IP address would go to the domain field. + If the URL contains a [literal IPv6 address](https://www.rfc-editor.org/rfc/rfc2732#section-2) + enclosed by `[` and `]`, the `[` and `]` characters should also be captured in the domain field. + examples: ["www.foo.bar", "opentelemetry.io", "3.12.167.2", "[1080:0:0:0:8:800:200C:417A]"] + - id: extension + type: string + brief: > + The file extension extracted from the `url.full`, excluding the leading dot. + note: > + The file extension is only set if it exists, as not every url has a file extension. + When the file name has multiple extensions `example.tar.gz`, only the last one should be captured `gz`, not `tar.gz`. + examples: [ "png", "gz" ] + - id: fragment stability: stable type: string - brief: 'The [URI scheme](https://www.rfc-editor.org/rfc/rfc3986#section-3.1) component identifying the used protocol.' - examples: ["https", "ftp", "telnet"] + brief: > + The [URI fragment](https://www.rfc-editor.org/rfc/rfc3986#section-3.5) component + examples: ["SemConv"] - id: full stability: stable type: string @@ -23,19 +42,66 @@ groups: `url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed) and SHOULD NOT be validated or modified except for sanitizing purposes. examples: ['https://www.foo.bar/search?q=OpenTelemetry#SemConv', '//localhost'] + - id: original + type: string + brief: > + Unmodified original URL as seen in the event source. + note: > + In network monitoring, the observed URL may be a full URL, whereas in access logs, the URL is often + just represented as a path. This field is meant to represent the URL as it was observed, complete or not. + + `url.original` might contain credentials passed via URL in form of `https://username:password@www.example.com/`. + In such case password and username SHOULD NOT be redacted and attribute's value SHOULD remain the same. + examples: ["https://www.foo.bar/search?q=OpenTelemetry#SemConv", "search?q=OpenTelemetry"] - id: path stability: stable type: string - brief: 'The [URI path](https://www.rfc-editor.org/rfc/rfc3986#section-3.3) component' - examples: ['/search'] + brief: > + The [URI path](https://www.rfc-editor.org/rfc/rfc3986#section-3.3) component + examples: ["/search"] + - id: port + type: int + brief: > + Port extracted from the `url.full` + examples: [443] - id: query stability: stable type: string - brief: 'The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component' + brief: > + The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component examples: ["q=OpenTelemetry"] - note: Sensitive content provided in query string SHOULD be scrubbed when instrumentations can identify it. - - id: fragment + note: > + Sensitive content provided in query string SHOULD be scrubbed when instrumentations can identify it. + - id: registered_domain + type: string + brief: > + The highest registered url domain, stripped of the subdomain. + examples: ["example.com", "foo.co.uk"] + note: > + This value can be determined precisely with the [public suffix list](http://publicsuffix.org). + For example, the registered domain for "foo.example.com" is "example.com". + Trying to approximate this by simply taking the last two labels will not work well for TLDs such as "co.uk". + - id: scheme stability: stable type: string - brief: 'The [URI fragment](https://www.rfc-editor.org/rfc/rfc3986#section-3.5) component' - examples: ["SemConv"] + brief: > + The [URI scheme](https://www.rfc-editor.org/rfc/rfc3986#section-3.1) component identifying the used protocol. + examples: ["https", "ftp", "telnet"] + - id: subdomain + type: string + brief: > + The subdomain portion of a fully qualified domain name includes all of the names except the host name + under the registered_domain. In a partially qualified domain, or if the the qualification level of the + full name cannot be determined, subdomain contains all of the names below the registered domain. + examples: ["east", "sub2.sub1"] + note: > + The subdomain portion of "www.east.mydomain.co.uk" is "east". If the domain has multiple levels of subdomain, + such as "sub2.sub1.example.com", the subdomain field should contain "sub2.sub1", with no trailing period. + - id: top_level_domain + type: string + brief: > + The effective top level domain (eTLD), also known as the domain suffix, is the last part of the domain name. + For example, the top level domain for example.com is `com`. + examples: ["com", "co.uk"] + note: > + This value can be determined precisely with the [public suffix list](http://publicsuffix.org).