Skip to content

Commit

Permalink
Reduce Linux Pressure (PSI) storage cost
Browse files Browse the repository at this point in the history
  • Loading branch information
pavlozt committed Dec 13, 2024
1 parent 6e9f003 commit bc3d6de
Show file tree
Hide file tree
Showing 3 changed files with 764 additions and 8 deletions.
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
zabbix_export:
version: '6.0'
date: '2023-09-19T18:02:09Z'
date: '2024-12-13T22:00:00Z'
groups:
- uuid: 846977d1dfed4968bc5f8bdb363285bc
name: 'Templates/Operating systems'
templates:
- uuid: fc3089e96aa34a9fa86fe178b7d2c9c9
template: 'Linux Pressure Stall Information - PSI'
name: 'Linux Pressure Stall Information - PSI'
description: 'Provides access to the pressure stall info for cpu, memory and io.'
description: |
Provides access to the pressure stall info for cpu, memory and io.
https://docs.kernel.org/accounting/psi.html
groups:
- name: 'Templates/Operating systems'
items:
Expand Down Expand Up @@ -444,29 +446,29 @@ zabbix_export:
- uuid: 17b344a3dae442c98149fab9437b4b40
name: 'CPU Pressure Stall Information - Text'
key: 'vfs.file.contents[/proc/pressure/cpu]'
delay: 5s
trends: '0'
history: '0'
value_type: TEXT
trends: '0'
description: 'Service item for gathering cpu ''some'' pressure (10s,60s,300s)'
tags:
- tag: component
value: cpu
- uuid: b78af20d636a4c8094f1161bc1518caf
name: 'IO Pressure Stall Information - Text'
key: 'vfs.file.contents[/proc/pressure/io]'
delay: 5s
trends: '0'
history: '0'
value_type: TEXT
trends: '0'
description: 'Service item for gathering io ''some'' and ''full'' pressure (10s,60s,300s)'
tags:
- tag: component
value: io
- uuid: a36abe771ad5456981a0ae2d26569507
name: 'Memory Pressure Stall Information - Text'
key: 'vfs.file.contents[/proc/pressure/memory]'
delay: 5s
trends: '0'
history: '0'
value_type: TEXT
trends: '0'
description: 'Service item for gathering memory ''some'' and ''full'' pressure (10s,60s,300s)'
tags:
- tag: component
Expand Down
96 changes: 96 additions & 0 deletions Operating_Systems/Linux/template_linux_pressure/7.0/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Linux - Pressure Stall Information

## Overview

Self-contained template for monitoring pressure stall information on Linux systems. Source: <https://github.com/hielsber-tamu/zabbix_template_linux_pressure>
## Author

Matthew Hielsberg - <[email protected]>

## Macros used

|Name|Description|Default|Type|
|----|-----------|-------|----|
|{$CPU_FULL_AVG10_THRESH}|CPU starvation for ALL processes over 10 seconds|0|Integer|
|{$CPU_FULL_AVG60_THRESH}|CPU starvation for ALL processes over 60 seconds|0|Integer|
|{$CPU_FULL_AVG300_THRESH}|CPU starvation for ALL processes over 300 seconds|0|Integer|
|{$CPU_SOME_AVG10_THRESH}|CPU starvation for some processes over 10 seconds|75|Integer|
|{$CPU_SOME_AVG60_THRESH}|CPU starvation for some processes over 60 seconds|50|Integer|
|{$CPU_SOME_AVG300_THRESH}|CPU starvation for some processes over 300 seconds|25|Integer|
|{$IO_FULL_AVG10_THRESH}|IO starvation for ALL processes over 10 seconds|10|Integer|
|{$IO_FULL_AVG60_THRESH}|IO starvation for ALL processes over 60 seconds|5|Integer|
|{$IO_FULL_AVG300_THRESH}|IO starvation for ALL processes over 300 seconds|1|Integer|
|{$IO_SOME_AVG10_THRESH}|IO starvation for some processes over 10 seconds|50|Integer|
|{$IO_SOME_AVG60_THRESH}|IO starvation for some processes over 60 seconds|10|Integer|
|{$IO_SOME_AVG300_THRESH}|IO starvation for some processes over 300 seconds|5|Integer|
|{$MEMORY_FULL_AVG10_THRESH}|Memory starvation for ALL processes over 10 seconds|10|Integer|
|{$MEMORY_FULL_AVG60_THRESH}|Memory starvation for ALL processes over 60 seconds|5|Integer|
|{$MEMORY_FULL_AVG300_THRESH}|Memory starvation for ALL processes over 300 seconds|1|Integer|
|{$MEMORY_SOME_AVG10_THRESH}|Memory starvation for some processes over 10 seconds|50|Integer|
|{$MEMORY_SOME_AVG60_THRESH}|Memory starvation for some processes over 60 seconds|10|Integer|
|{$MEMORY_SOME_AVG300_THRESH}|Memory starvation for some processes over 300 seconds|5|Integer|

## Template links

There are no template links in this template.

## Discovery rules

There are no discovery rules in this template

## Items collected

|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|CPU Pressure Stall Information - Text|Service item for gathering cpu ''some'' pressure (10s,60s,300s)|TEXT|key: vfs.file.contents[/proc/pressure/cpu]|
|CPU Pressure Stall Information - Full - 10s Average|The percentage of time all tasks were stalled on the CPU over the last 10s window.|FLOAT|key: psi_mth.cpu.full.avg10, master_item key: vfs.file.contents[/proc/pressure/cpu]|
|CPU Pressure Stall Information - Full - 60s Average|The percentage of time all tasks were stalled on the CPU over the last 60s window.|FLOAT|key: psi_mth.cpu.full.avg60, master_item key: vfs.file.contents[/proc/pressure/cpu]|
|CPU Pressure Stall Information - Full - 300s Average|The percentage of time all tasks were stalled on the CPU over the last 300s window.|FLOAT|key: psi_mth.cpu.full.avg300, master_item key: vfs.file.contents[/proc/pressure/cpu]|
|CPU Pressure Stall Information - Some - 10s Average|The percentage of time some tasks were stalled on the CPU over the last 10s window.|FLOAT|key: psi_mth.cpu.some.avg10, master_item key: vfs.file.contents[/proc/pressure/cpu]|
|CPU Pressure Stall Information - Some - 60s Average|The percentage of time some tasks were stalled on the CPU over the last 60s window.|FLOAT|key: psi_mth.cpu.some.avg60, master_item key: vfs.file.contents[/proc/pressure/cpu]|
|CPU Pressure Stall Information - Some - 300s Average|The percentage of time some tasks were stalled on the CPU over the last 300s window.|FLOAT|key: psi_mth.cpu.some.avg300, master_item key: vfs.file.contents[/proc/pressure/cpu]|
|IO Pressure Stall Information - Text|Service item for gathering io ''some'' and ''full'' pressure (10s,60s,300s)|TEXT|key: vfs.file.contents[/proc/pressure/io]|
|IO Pressure Stall Information - Full - 10s Average|The percentage of time all tasks were waiting on IO over the last 10s window.|FLOAT|key: psi_mth.io.full.avg10, master_item key: vfs.file.contents[/proc/pressure/io]|
|IO Pressure Stall Information - Full - 60s Average|The percentage of time all tasks were waiting on IO over the last 60s window.|FLOAT|key: psi_mth.io.full.avg60, master_item key: vfs.file.contents[/proc/pressure/io]|
|IO Pressure Stall Information - Full - 300s Average|The percentage of time all tasks were waiting on IO over the last 300s window.|FLOAT|key: psi_mth.io.full.avg300, master_item key: vfs.file.contents[/proc/pressure/io]|
|IO Pressure Stall Information - Some - 10s Average|The percentage of time some tasks were waiting on IO over the last 10s window.|FLOAT|key: psi_mth.io.some.avg10, master_item key: vfs.file.contents[/proc/pressure/io]|
|IO Pressure Stall Information - Some - 60s Average|The percentage of time some tasks were waiting on IO over the last 60s window.|FLOAT|key: psi_mth.io.some.avg60, master_item key: vfs.file.contents[/proc/pressure/io]|
|IO Pressure Stall Information - Some - 300s Average|The percentage of time some tasks were waiting on IO over the last 300s window.|FLOAT|key: psi_mth.io.some.avg300, master_item key: vfs.file.contents[/proc/pressure/io]|
|Memory Pressure Stall Information - Text|Service item for gathering memory ''some'' and ''full'' pressure (10s,60s,300s)|TEXT|key: vfs.file.contents[/proc/pressure/memory]|
|Memory Pressure Stall Information - Full - 10s Average|The percentage of time all tasks were waiting on memory over the last 10s window.|FLOAT|key: psi_mth.memory.full.avg10, master_item key: vfs.file.contents[/proc/pressure/memory]|
|Memory Pressure Stall Information - Full - 60s Average'|The percentage of time all tasks were waiting on memory over the last 60s window.|FLOAT|key: psi_mth.memory.full.avg60, master_item key: vfs.file.contents[/proc/pressure/memory]|
|Memory Pressure Stall Information - Full - 300s Average|The percentage of time all tasks were waiting on memory over the last 300s window.|FLOAT|key: psi_mth.memory.full.avg300, master_item key: vfs.file.contents[/proc/pressure/memory]|
|Memory Pressure Stall Information - Some - 10s Average|The percentage of time tasks were waiting on memory over the last 10s window.|FLOAT|key: psi_mth.memory.some.avg10, master_item key: vfs.file.contents[/proc/pressure/memory]|
|Memory Pressure Stall Information - Some - 60s Average'|The percentage of time tasks were waiting on memory over the last 60s window.|FLOAT|key: psi_mth.memory.some.avg60, master_item key: vfs.file.contents[/proc/pressure/memory]|
|Memory Pressure Stall Information - Some - 300s Average|The percentage of time tasks were waiting on memory over the last 300s window.|FLOAT|key: psi_mth.memory.some.avg300, master_item key: vfs.file.contents[/proc/pressure/memory]|

## Triggers

|Name|Description|Expression|Priority|
|----|-----------|----------|--------|
|Linux PSI - CPU Full Avg 10 - Exceeds Threshold|The percentage of time all tasks were stalled on the CPU over the last 10s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.cpu.full.avg10)>{$CPU_FULL_AVG10_THRESH}|WARNING|
|Linux PSI - CPU Full Avg 60 - Exceeds Threshold|The percentage of time all tasks were stalled on the CPU over the last 60s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.cpu.full.avg60)>{$CPU_FULL_AVG60_THRESH}|WARNING|
|Linux PSI - CPU Full Avg 300 - Exceeds Threshold|The percentage of time all tasks were stalled on the CPU over the last 300s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.cpu.full.avg300)>{$CPU_FULL_AVG300_THRESH}|WARNING|
|Linux PSI - CPU Some Avg 10 - Exceeds Threshold|The percentage of time some tasks were stalled on the CPU over the last 10s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.cpu.some.avg10)>{$CPU_SOME_AVG10_THRESH}|INFO|
|Linux PSI - CPU Some Avg 60 - Exceeds Threshold|The percentage of time some tasks were stalled on the CPU over the last 60s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.cpu.some.avg60)>{$CPU_SOME_AVG60_THRESH}|INFO|
|Linux PSI - CPU Some Avg 300 - Exceeds Threshold|The percentage of time some tasks were stalled on the CPU over the last 300s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.cpu.some.avg300)>{$CPU_SOME_AVG300_THRESH}|INFO|
|Linux PSI - IO Full Avg 10 - Exceeds Threshold|The percentage of time all tasks were waiting on IO over the last 10s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.io.full.avg10)>{$IO_FULL_AVG10_THRESH}|INFO|
|Linux PSI - IO Full Avg 60 - Exceeds Threshold|The percentage of time all tasks were waiting on IO over the last 60s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.io.full.avg60)>{$IO_FULL_AVG60_THRESH}|INFO|
|Linux PSI - IO Full Avg 300 - Exceeds Threshold|The percentage of time all tasks were waiting on IO over the last 300s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.io.full.avg300)>{$IO_FULL_AVG300_THRESH}|INFO|
|Linux PSI - IO Some Avg 10 - Exceeds Threshold|The percentage of time some tasks were waiting on IO over the last 10s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.io.some.avg10)>{$IO_SOME_AVG10_THRESH}|INFO|
|Linux PSI - IO Some Avg 60 - Exceeds Threshold|The percentage of time some tasks were waiting on IO over the last 60s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.io.some.avg60)>{$IO_SOME_AVG60_THRESH}|INFO|
|Linux PSI - IO Some Avg 300 - Exceeds Threshold|The percentage of time some tasks were waiting on IO over the last 300s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.io.some.avg300)>{$IO_SOME_AVG300_THRESH}|INFO|
|Linux PSI - Memory Full Avg 10 - Exceeds Threshold|The percentage of time all tasks were waiting on memory over the last 10s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.memory.full.avg10)>{$MEMORY_FULL_AVG10_THRESH}|INFO|
|Linux PSI - Memory Full Avg 60 - Exceeds Threshold|The percentage of time all tasks were waiting on memory over the last 60s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.memory.full.avg60)>{$MEMORY_FULL_AVG60_THRESH}|INFO|
|Linux PSI - Memory Full Avg 300 - Exceeds Threshold|The percentage of time all tasks were waiting on memory over the last 300s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.memory.full.avg300)>{$MEMORY_FULL_AVG300_THRESH}|INFO|
|Linux PSI - Memory Some Avg 10 - Exceeds Threshold|The percentage of time some tasks were waiting on memory over the last 10s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.memory.some.avg10)>{$MEMORY_SOME_AVG10_THRESH}|INFO|
|Linux PSI - Memory Some Avg 60 - Exceeds Threshold|The percentage of time some tasks were waiting on memory over the last 60s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.memory.some.avg60)>{$MEMORY_SOME_AVG60_THRESH}|INFO|
|Linux PSI - Memory Some Avg 300 - Exceeds Threshold|The percentage of time some tasks were waiting on memory over the last 300s window exceeds the threshold|last(/Linux Pressure Stall Information - PSI/psi_mth.memory.some.avg300)>{$MEMORY_SOME_AVG300_THRESH}|INFO|

## Graphs

Three graphs are included, each of which contains the 10, 60 and 300 second averages for both 'some' and 'full'.

- CPU Pressure Stall Information
- IO Pressure Stall Information
- Memory Pressure Stall Information
Loading

0 comments on commit bc3d6de

Please sign in to comment.