Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instance level filtering for the "workload_volume.yaml" #2559

Closed
debbrata-netapp opened this issue Dec 20, 2023 · 6 comments · Fixed by #2575
Closed

Instance level filtering for the "workload_volume.yaml" #2559

debbrata-netapp opened this issue Dec 20, 2023 · 6 comments · Fixed by #2575
Assignees
Labels

Comments

@debbrata-netapp
Copy link

Is your feature request related to a problem? Please describe.
Customer's requirement is to monitor latency for few critical volumes out of many at a granular level like , per 5-sec.
With this available template we would end up polling data from a system which contains 1000+ of volumes , which is not required for customer. With the existing template we would end up in unnecessary polling and data size is also large , especially for large systems.

Describe the solution you'd like
An instance level polling/filtering can reduce the excessive polling.
This will reduce the CPU pressure and the data set is much smaller for customer to retain historical information for a longer duration.

Describe alternatives you've considered
NA

Additional context
NA

@debbrata-netapp debbrata-netapp added the feature New feature or request label Dec 20, 2023
@debbrata-netapp
Copy link
Author

  1. workload.yaml
  2. workload_detail.yaml
  3. workload_detail_volume.yaml

The same filtering could be applied on the similar templates ,which will help reducing the resource utilization.

@rahulguptajss
Copy link
Contributor

@debbrata-netapp

  1. Will the template use hardcoded UUIDs/Names of instances or a regex for instance filtering?
  2. What's the customer's cluster version?

@debbrata-netapp
Copy link
Author

@rahulguptajss

  1. A regex for instance filtering would work better as it will enable customer to add/remove instances as per their requirement at a later point in time. hardcoded UUIDs/Names would require an update of the template.
  2. Cluster is currently on 9.9.1P17 and the plan is to go to 9.11.1PX.

@rahulguptajss
Copy link
Contributor

rahulguptajss commented Dec 20, 2023

Thanks @debbrata-netapp

Given that RestPerf collector support is available for ONTAP version 9.12+, We'll focus on filter support for the ZapiPerf Collector for this issue. The filter support for RestPerf is being tracked separately under issue #2534. We'll aim to provide filtering capabilities in line with those provided by ONTAP performance ZAPIs.

@rahulguptajss
Copy link
Contributor

@debbrata-netapp This feature is now available through the nightly builds. The documentation can be found here, and the steps for NABox are provided here.

@rahulguptajss rahulguptajss removed their assignment Feb 12, 2024
@Hardikl Hardikl self-assigned this Feb 13, 2024
@Hardikl
Copy link
Contributor

Hardikl commented Feb 15, 2024

Tested in main with commit 503b460

Validation for workload_detail ZapiPerf template

harvest % ./bin/poller -p sar -c ZapiPerf -o WorkloadDetail  --promPort 14001
2024-02-15T15:19:43+05:30 INF poller/poller.go:215 > Init Poller=sar asup=false confPath=conf config=harvest.yml configPath=harvest.yml daemon=false debug=false homePath= hostname=hardikl-mac-0 logLevel=info logPath=/var/log/harvest/ profiling=0 promPort=14001 version="harvest version 24.02.1512-v23.11.0 (commit 503b4600) (build date 2024-02-15T12:56:33+0530) darwin/amd64"
2024-02-15T15:19:43+05:30 INF poller/poller.go:239 > started in foreground Poller=sar pid=75033
2024-02-15T15:19:45+05:30 INF collector/helpers.go:84 > best-fit template Poller=sar collector=ZapiPerf:WorkloadDetail path=conf/zapiperf/cdot/9.8.0/workload_detail.yaml v=9.13.1
2024-02-15T15:19:45+05:30 INF poller/poller.go:357 > Autosupport scheduled. Poller=sar asupSchedule=24h
2024-02-15T15:19:45+05:30 INF poller/poller.go:366 > poller start-up complete Poller=sar
2024-02-15T15:19:45+05:30 INF prometheus/httpd.go:40 > server listen Poller=sar exporter=prometheus1 url=http://:14001/metrics
2024-02-15T15:19:45+05:30 INF poller/poller.go:523 > updated status, up collectors: 1 (of 1), up exporters: 1 (of 1) Poller=sar
2024-02-15T15:19:45+05:30 INF collector/collector.go:585 > Collected Poller=sar apiMs=244 collector=ZapiPerf:WorkloadDetail metrics=37 pollMs=245 task=counter zBegin=1707990585299
2024-02-15T15:19:46+05:30 INF collector/collector.go:585 > Collected Poller=sar apiMs=999 collector=ZapiPerf:WorkloadDetail instances=60 pollMs=999 task=instance zBegin=1707990585544
2024-02-15T15:21:02+05:30 INF collector/collector.go:585 > Collected Poller=sar apiMs=15634 calcMs=1 collector=ZapiPerf:WorkloadDetail exportMs=3 instances=960 instancesExported=60 metrics=5760 metricsExported=1986 parseMs=110 pluginMs=0 pollMs=15751 skips=360 zBegin=1707990646542

Added filter in workload_detail ZapiPerf template

counters:
  - instance_name
  - instance_uuid
  - service_time
  - visits
  - wait_time
  - filter:
      - workload-name: "*lun*"
  - refine:
      - with_constituents: false # The possible values are true or false. Setting this to true will include constituents in the results, while false will exclude them.
harvest % ./bin/poller -p sar -c ZapiPerf -o WorkloadDetail  --promPort 14001
2024-02-15T15:50:37+05:30 INF poller/poller.go:215 > Init Poller=sar asup=false confPath=conf config=harvest.yml configPath=harvest.yml daemon=false debug=false homePath= hostname=hardikl-mac-0 logLevel=info logPath=/var/log/harvest/ profiling=0 promPort=14001 version="harvest version 24.02.1512-v23.11.0 (commit 503b4600) (build date 2024-02-15T12:56:33+0530) darwin/amd64"
2024-02-15T15:50:37+05:30 INF poller/poller.go:239 > started in foreground Poller=sar pid=76430
2024-02-15T15:50:39+05:30 INF collector/helpers.go:84 > best-fit template Poller=sar collector=ZapiPerf:WorkloadDetail path=conf/zapiperf/cdot/9.8.0/workload_detail.yaml v=9.13.1
2024-02-15T15:50:39+05:30 INF poller/poller.go:357 > Autosupport scheduled. Poller=sar asupSchedule=24h
2024-02-15T15:50:39+05:30 INF poller/poller.go:366 > poller start-up complete Poller=sar
2024-02-15T15:50:39+05:30 INF prometheus/httpd.go:40 > server listen Poller=sar exporter=prometheus1 url=http://:14001/metrics
2024-02-15T15:50:39+05:30 INF poller/poller.go:523 > updated status, up collectors: 1 (of 1), up exporters: 1 (of 1) Poller=sar
2024-02-15T15:50:39+05:30 INF collector/collector.go:585 > Collected Poller=sar apiMs=238 collector=ZapiPerf:WorkloadDetail metrics=37 pollMs=238 task=counter zBegin=1707992439484
2024-02-15T15:50:40+05:30 INF collector/collector.go:585 > Collected Poller=sar apiMs=996 collector=ZapiPerf:WorkloadDetail instances=2 pollMs=996 task=instance zBegin=1707992439722
2024-02-15T15:51:42+05:30 INF collector/collector.go:585 > Collected Poller=sar apiMs=1989 calcMs=0 collector=ZapiPerf:WorkloadDetail exportMs=0 instances=32 instancesExported=2 metrics=192 metricsExported=72 parseMs=4 pluginMs=0 pollMs=1994 skips=12 zBegin=1707992500718
image

Validation for Lun ZapiPerf template

harvest % ./bin/poller -p sar -c ZapiPerf -o Lun  --promPort 14001
2024-02-15T15:55:31+05:30 INF poller/poller.go:215 > Init Poller=sar asup=false confPath=conf config=harvest.yml configPath=harvest.yml daemon=false debug=false homePath= hostname=hardikl-mac-0 logLevel=info logPath=/var/log/harvest/ profiling=0 promPort=14001 version="harvest version 24.02.1512-v23.11.0 (commit 503b4600) (build date 2024-02-15T12:56:33+0530) darwin/amd64"
2024-02-15T15:55:31+05:30 INF poller/poller.go:239 > started in foreground Poller=sar pid=77801
2024-02-15T15:55:33+05:30 INF collector/helpers.go:84 > best-fit template Poller=sar collector=ZapiPerf:Lun path=conf/zapiperf/cdot/9.8.0/lun.yaml v=9.13.1
2024-02-15T15:55:33+05:30 INF poller/poller.go:357 > Autosupport scheduled. Poller=sar asupSchedule=24h
2024-02-15T15:55:33+05:30 INF poller/poller.go:366 > poller start-up complete Poller=sar
2024-02-15T15:55:33+05:30 INF prometheus/httpd.go:40 > server listen Poller=sar exporter=prometheus1 url=http://:14001/metrics
2024-02-15T15:55:33+05:30 INF poller/poller.go:523 > updated status, up collectors: 1 (of 1), up exporters: 1 (of 1) Poller=sar
2024-02-15T15:55:35+05:30 INF collector/collector.go:585 > Collected Poller=sar apiMs=2304 collector=ZapiPerf:Lun metrics=39 pollMs=2305 task=counter zBegin=1707992733145
2024-02-15T15:55:35+05:30 INF collector/collector.go:585 > Collected Poller=sar apiMs=248 collector=ZapiPerf:Lun instances=5 pollMs=248 task=instance zBegin=1707992735450
2024-02-15T15:56:36+05:30 INF collector/collector.go:585 > Collected Poller=sar apiMs=726 calcMs=0 collector=ZapiPerf:Lun exportMs=0 instances=5 instancesExported=5 metrics=180 metricsExported=210 parseMs=2 pluginMs=0 pollMs=728 skips=0 zBegin=1707992795698
2024-02-15T15:57:36+05:30 INF collector/collector.go:585 > Collected Poller=sar apiMs=746 calcMs=0 collector=ZapiPerf:Lun exportMs=1 instances=5 instancesExported=5 metrics=180 metricsExported=210 parseMs=2 pluginMs=0 pollMs=748 skips=0 zBegin=1707992855698
image

Added filter in Lun ZapiPerf template

counters:
  - avg_read_latency
  - avg_write_latency
  - avg_xcopy_latency
  - caw_reqs
  - enospc
  - instance_name
  - queue_full
  - read_align_histo
  - read_data
  - read_ops
  - read_partial_blocks
  - remote_bytes
  - remote_ops
  - unmap_reqs
  - vserver_name        => svm
  - write_align_histo
  - write_data
  - write_ops
  - write_partial_blocks
  - writesame_reqs
  - writesame_unmap_reqs
  - xcopy_reqs
  - filter:
      - vserver_name=osc
harvest % ./bin/poller -p sar -c ZapiPerf -o Lun  --promPort 14001
2024-02-15T16:00:09+05:30 INF poller/poller.go:215 > Init Poller=sar asup=false confPath=conf config=harvest.yml configPath=harvest.yml daemon=false debug=false homePath= hostname=hardikl-mac-0 logLevel=info logPath=/var/log/harvest/ profiling=0 promPort=14001 version="harvest version 24.02.1512-v23.11.0 (commit 503b4600) (build date 2024-02-15T12:56:33+0530) darwin/amd64"
2024-02-15T16:00:09+05:30 INF poller/poller.go:239 > started in foreground Poller=sar pid=78233
2024-02-15T16:00:11+05:30 INF collector/helpers.go:84 > best-fit template Poller=sar collector=ZapiPerf:Lun path=conf/zapiperf/cdot/9.8.0/lun.yaml v=9.13.1
2024-02-15T16:00:11+05:30 INF poller/poller.go:357 > Autosupport scheduled. Poller=sar asupSchedule=24h
2024-02-15T16:00:11+05:30 INF poller/poller.go:366 > poller start-up complete Poller=sar
2024-02-15T16:00:11+05:30 INF prometheus/httpd.go:40 > server listen Poller=sar exporter=prometheus1 url=http://:14001/metrics
2024-02-15T16:00:11+05:30 INF poller/poller.go:523 > updated status, up collectors: 1 (of 1), up exporters: 1 (of 1) Poller=sar
2024-02-15T16:00:13+05:30 INF collector/collector.go:585 > Collected Poller=sar apiMs=2290 collector=ZapiPerf:Lun metrics=39 pollMs=2291 task=counter zBegin=1707993011604
2024-02-15T16:00:16+05:30 INF collector/collector.go:585 > Collected Poller=sar apiMs=2802 collector=ZapiPerf:Lun instances=1 pollMs=2802 task=instance zBegin=1707993013895
2024-02-15T16:01:22+05:30 INF collector/collector.go:585 > Collected Poller=sar apiMs=5894 calcMs=0 collector=ZapiPerf:Lun exportMs=0 instances=1 instancesExported=1 metrics=36 metricsExported=74 parseMs=0 pluginMs=0 pollMs=5895 skips=0 zBegin=1707993076697
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants