Skip to content

Commit

Permalink
Dashboards: add 'Read path' selector to 'Mimir / Queries' dashboard (#…
Browse files Browse the repository at this point in the history
…8878)

* Dashboards: add 'Read path' selector to 'Mimir / Queries' dashboard

Signed-off-by: Marco Pracucci <[email protected]>

* Updated CHANGELOG

Signed-off-by: Marco Pracucci <[email protected]>

* Add mixin linter exclusion

Signed-off-by: Marco Pracucci <[email protected]>

* Rename 'Standard' to 'Main'

Signed-off-by: Marco Pracucci <[email protected]>

---------

Signed-off-by: Marco Pracucci <[email protected]>
  • Loading branch information
pracucci authored Aug 27, 2024
1 parent e0fa677 commit 5455b8c
Show file tree
Hide file tree
Showing 15 changed files with 366 additions and 208 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@
* [ENHANCEMENT] Dashboards: add Kafka end-to-end latency outliers panel in the "Mimir / Writes" dashboard. #8948
* [ENHANCEMENT] Dashboards: add "Out-of-order samples appended" panel to "Mimir / Tenants" dashboard. #8939
* [ENHANCEMENT] Alerts: `RequestErrors` and `RulerRemoteEvaluationFailing` have been enriched with a native histogram version. #9004
* [ENHANCEMENT] Dashboards: add 'Read path' selector to 'Mimir / Queries' dashboard. #8878
* [BUGFIX] Dashboards: fix "current replicas" in autoscaling panels when HPA is not active. #8566
* [BUGFIX] Alerts: do not fire `MimirRingMembersMismatch` during the migration to experimental ingest storage. #8727

Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

105 changes: 69 additions & 36 deletions operations/mimir-mixin-compiled/dashboards/mimir-queries.json

Large diffs are not rendered by default.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion operations/mimir-mixin-tools/serve/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ set -e

SCRIPT_DIR=$(cd `dirname $0` && pwd)
# Ensure we run recent Grafana.
GRAFANA_VERSION=11.0.0
GRAFANA_VERSION=11.1.3
DOCKER_CONTAINER_NAME="mixin-serve-grafana"
DOCKER_OPTS=""

Expand Down
3 changes: 2 additions & 1 deletion operations/mimir-mixin/.lint
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,10 @@ exclusions:
- dashboard: Mimir / Top tenants
panel: Top $limit users by received exemplars rate in last 5m
target-promql-rule:
reason: Skipping in dashboards where the linter parses a Loki query as Prometheus one.
reason: Skipping in dashboards where the linter parses a Loki query as Prometheus one, or we define label matchers as template variables.
entries:
- dashboard: Mimir / Slow queries
- dashboard: Mimir / Queries
template-datasource-rule:
reason: We prefer to keep calling "datasource" the Prometheus datasource to keep consistency between dashboards.
entries:
Expand Down
4 changes: 4 additions & 0 deletions operations/mimir-mixin/config.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@
alertmanager: ['alertmanager', 'cortex', 'mimir', 'mimir-backend.*'],
overrides_exporter: ['overrides-exporter', 'mimir-backend.*'],

// The following are job matchers used to select all components in the read path.
main_read_path: std.uniq(std.sort(self.query_frontend + self.query_scheduler + self.querier)),
remote_ruler_read_path: std.uniq(std.sort(self.ruler_query_frontend + self.ruler_query_scheduler + self.ruler_querier)),

// The following are job matchers used to select all components in a given "path".
write: ['distributor.*', 'ingester.*', 'mimir-write.*'],
read: ['query-frontend.*', 'querier.*', 'ruler-query-frontend.*', 'ruler-querier.*', 'mimir-read.*'],
Expand Down
94 changes: 51 additions & 43 deletions operations/mimir-mixin/dashboards/dashboard-utils.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -126,30 +126,38 @@ local utils = import 'mixin-utils/utils.libsonnet';
addActiveUserSelectorTemplates()::
self.addTemplate('user', 'cortex_ingester_active_series{%s=~"$cluster", %s=~"$namespace"}' % [$._config.per_cluster_label, $._config.per_namespace_label], 'user', sort=sortAscending),

addCustomTemplate(name, values, defaultIndex=0):: self {
addCustomTemplate(label, name, options, defaultIndex=0):: self {
// Escape the comma because it's used a separator in the options list.
local escapeValue(v) = std.strReplace(v, ',', '\\,'),

templating+: {
list+: [
{
name: name,
options: [
{
selected: v == values[defaultIndex],
text: v,
value: v,
}
for v in values
],
current: {
selected: true,
text: values[defaultIndex],
value: values[defaultIndex],
},
type: 'custom',
hide: 0,
includeAll: false,
multi: false,
list+: [{
current: {
selected: true,
text: options[defaultIndex].label,
value: escapeValue(options[defaultIndex].value),
},
],
hide: 0,
includeAll: false,
label: label,
multi: false,
name: name,
query: std.join(',', [
'%s : %s' % [option.label, escapeValue(option.value)]
for option in options
]),
options: [
{
selected: option.label == options[defaultIndex].label,
text: option.label,
value: escapeValue(option.value),
}
for option in options
],
skipUrlSync: false,
type: 'custom',
useTags: false,
}],
},
},
},
Expand Down Expand Up @@ -1884,7 +1892,7 @@ local utils = import 'mixin-utils/utils.libsonnet';
},
},

ingestStorageFetchLastProducedOffsetRequestsPanel(jobName)::
ingestStorageFetchLastProducedOffsetRequestsPanel(jobMatcher)::
$.timeseriesPanel('Fetch last produced offset requests / sec') +
$.panelDescription(
'Fetch last produced offset requests / sec',
Expand All @@ -1896,10 +1904,10 @@ local utils = import 'mixin-utils/utils.libsonnet';
sum(rate(cortex_ingest_storage_reader_last_produced_offset_requests_total{%s}[$__rate_interval]))
-
sum(rate(cortex_ingest_storage_reader_last_produced_offset_failures_total{%s}[$__rate_interval]))
||| % [$.jobMatcher($._config.job_names[jobName]), $.jobMatcher($._config.job_names[jobName])],
||| % [jobMatcher, jobMatcher],
|||
sum(rate(cortex_ingest_storage_reader_last_produced_offset_failures_total{%s}[$__rate_interval]))
||| % [$.jobMatcher($._config.job_names[jobName])],
||| % [jobMatcher],
],
[
'successful',
Expand All @@ -1913,7 +1921,7 @@ local utils = import 'mixin-utils/utils.libsonnet';
$.aliasColors({ successful: $._colors.success, failed: $._colors.failed }) +
$.stack,

ingestStorageFetchLastProducedOffsetLatencyPanel(jobName)::
ingestStorageFetchLastProducedOffsetLatencyPanel(jobMatcher)::
$.timeseriesPanel('Fetch last produced offset latency') +
$.panelDescription(
'Fetch last produced offset latency',
Expand All @@ -1923,10 +1931,10 @@ local utils = import 'mixin-utils/utils.libsonnet';
) +
$.queryPanel(
[
'histogram_avg(sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_quantile(0.99, sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_quantile(0.999, sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_quantile(1.0, sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_avg(sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [jobMatcher],
'histogram_quantile(0.99, sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [jobMatcher],
'histogram_quantile(0.999, sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [jobMatcher],
'histogram_quantile(1.0, sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [jobMatcher],
],
[
'avg',
Expand All @@ -1940,10 +1948,10 @@ local utils = import 'mixin-utils/utils.libsonnet';
},
},

ingestStorageStrongConsistencyRequestsPanel(jobName)::
// The unit changes whether the metric is exposed from ingesters or other components. In the ingesters it's the
ingestStorageStrongConsistencyRequestsPanel(component, jobMatcher)::
// The unit changes whether the metric is exposed from ingesters (partition-reader) or other components. In the ingesters it's the
// requests issued by queriers to ingesters, while in other components it's the actual query.
local unit = if jobName == 'ingester' then 'requests' else 'queries';
local unit = if component == 'partition-reader' then 'requests' else 'queries';
local title = '%s with strong read consistency / sec' % (std.asciiUpper(std.substr(unit, 0, 1)) + std.substr(unit, 1, std.length(unit) - 1));

$.timeseriesPanel(title) +
Expand All @@ -1956,13 +1964,13 @@ local utils = import 'mixin-utils/utils.libsonnet';
$.queryPanel(
[
|||
sum(rate(cortex_ingest_storage_strong_consistency_requests_total{%s}[$__rate_interval]))
sum(rate(cortex_ingest_storage_strong_consistency_requests_total{component="%(component)s", %(jobMatcher)s}[$__rate_interval]))
-
sum(rate(cortex_ingest_storage_strong_consistency_failures_total{%s}[$__rate_interval]))
||| % [$.jobMatcher($._config.job_names[jobName]), $.jobMatcher($._config.job_names[jobName])],
sum(rate(cortex_ingest_storage_strong_consistency_failures_total{component="%(component)s", %(jobMatcher)s}[$__rate_interval]))
||| % { jobMatcher: jobMatcher, component: component },
|||
sum(rate(cortex_ingest_storage_strong_consistency_failures_total{%s}[$__rate_interval]))
||| % [$.jobMatcher($._config.job_names[jobName])],
sum(rate(cortex_ingest_storage_strong_consistency_failures_total{component="%(component)s", %(jobMatcher)s}[$__rate_interval]))
||| % { jobMatcher: jobMatcher, component: component },
],
[
'successful',
Expand All @@ -1976,18 +1984,18 @@ local utils = import 'mixin-utils/utils.libsonnet';
$.aliasColors({ successful: $._colors.success, failed: $._colors.failed }) +
$.stack,

ingestStorageStrongConsistencyWaitLatencyPanel(jobName)::
ingestStorageStrongConsistencyWaitLatencyPanel(component, jobMatcher)::
$.timeseriesPanel('Strong read consistency queries — wait latency') +
$.panelDescription(
'Strong read consistency queries — wait latency',
'How long does the request wait to guarantee strong read consistency.',
) +
$.queryPanel(
[
'histogram_avg(sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_quantile(0.99, sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_quantile(0.999, sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_quantile(1.0, sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_avg(sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{component="%(component)s", %(jobMatcher)s}[$__rate_interval])))' % { component: component, jobMatcher: jobMatcher },
'histogram_quantile(0.99, sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{component="%(component)s", %(jobMatcher)s}[$__rate_interval])))' % { component: component, jobMatcher: jobMatcher },
'histogram_quantile(0.999, sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{component="%(component)s", %(jobMatcher)s}[$__rate_interval])))' % { component: component, jobMatcher: jobMatcher },
'histogram_quantile(1.0, sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{component="%(component)s", %(jobMatcher)s}[$__rate_interval])))' % { component: component, jobMatcher: jobMatcher },
],
[
'avg',
Expand Down
Loading

0 comments on commit 5455b8c

Please sign in to comment.