Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add disk monitoring #233

Merged
merged 23 commits into from
Jul 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
5057175
Update the server-side api
perllaghu Mar 14, 2024
bb67a1b
In theory, add disk stuff to the front end
perllaghu Mar 14, 2024
84a7a19
Working as dev environment
perllaghu Mar 15, 2024
7cac933
shift config to individual views, tweak the CONTRIB docs, and add an …
perllaghu Mar 15, 2024
5876b18
Update the readme
perllaghu Mar 15, 2024
4028f1e
Update static/main.js to pass eslint, and create a single style entry
perllaghu Mar 16, 2024
bf0f0fb
Correct debugging mis-naming
perllaghu Mar 17, 2024
17ccf28
Replace missing semicolon..
perllaghu Mar 17, 2024
a722408
Merge branch 'main' into add_disk_monitoring
krassowski Apr 10, 2024
f4fab8c
fix: Compute disk warning state with config.disk_warning_threshold
iandesj Apr 26, 2024
de8d695
feat: Add model class for keeping resource warnings
iandesj Apr 26, 2024
35b8fdc
feat: Condition to flash warnings no looks at all computed warnings
iandesj Apr 26, 2024
2daca6e
chore: Run lint fix
iandesj Apr 26, 2024
8402125
chore: Run lint fix again
iandesj Apr 26, 2024
7bad1e3
chore: Address critical and high dependabot flagged packages
iandesj May 1, 2024
ff8eb40
task: remove console log
iandesj Jun 14, 2024
16af8a4
Update CONTRIBUTING.md
iandesj Jul 30, 2024
a15aaa8
Fix typo in CONTRIBUTING.md
iandesj Jul 30, 2024
6fd3140
Fix typo in README.md docs
iandesj Jul 30, 2024
613c1bd
Fix server extension docs language
iandesj Jul 30, 2024
f8a69c4
Fix typo regarding disk warning thresholds
iandesj Jul 30, 2024
7f6bc44
Catch Exception instead of nothing at all
iandesj Jul 30, 2024
e0e41d3
Update docs and delete example server config
iandesj Jul 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ __pycache__/

# Distribution / packaging
.Python
.direnv
.envrc
env/
build/
develop-eggs/
Expand Down
8 changes: 8 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,14 @@ JupyterLab v3.0.0
jupyter-resource-usage v0.1.0 enabled OK
```

## Which code creates what content

The stats are created by the server-side code in `jupyter_resource_usage`.

For the jupyterlab 4 / notebook 7 UIs, the code in `packages/labextension` creates and writes the content for both the statusbar and the topbar.

The topbar is defined in the schema, whilst the contents of the statusbar is driven purely by the labextension code.... and labels are defined by their appropriate `*View.tsx` file

## pre-commit

`jupyter-resource-usage` has adopted automatic code formatting so you shouldn't need to worry too much about your code style.
Expand Down
21 changes: 20 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,23 @@ memory:

![Screenshot with CPU and memory](./doc/statusbar-cpu.png)

### Disk [partition] Usage

`jupyter-resource-usage` can also track disk usage [of a defined partition] and report the `total` and `used` values as part of the `/api/metrics/v1` response.

You enable tracking by setting the `track_disk_usage` trait (disabled by default):

```python
c = get_config()
c.ResourceUseDisplay.track_disk_usage = True
```

The values are from the partition containing the folder in the trait `disk_path` (which defaults to `/home/joyvan`). If this path does not exist, disk usage information is omitted from the display.

Mirroring CPU and Memory, the trait `disk_warning_threshold` signifies when to flag a usage warning, and like the others, it defaults to `0.1` (10% remaining)

![Screenshot with Disk, CPU, and memory](./doc/statusbar_disk.png)

### Disable Prometheus Metrics

There is a [known bug](https://github.com/jupyter-server/jupyter-resource-usage/issues/123) with Prometheus metrics which
Expand All @@ -157,9 +174,11 @@ render the alternative frontend in the topbar.
Users can change the label and refresh rate for the alternative frontend using settings
editor.

(The vertical bars are included by default, to help separate the three indicators.)

## Resources Displayed

Currently the server extension only reports memory usage and CPU usage. Other metrics will be added in the future as needed.
Currently the server extension reports disk usage, memory usage and CPU usage. Other metrics will be added in the future as needed.

Memory usage will show the PSS whenever possible (Linux only feature), and default to RSS otherwise.

Expand Down
Binary file modified doc/settings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/statusbar_disk.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 14 additions & 0 deletions jupyter_resource_usage/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,20 @@ async def get(self):

metrics.update(cpu_percent=cpu_percent, cpu_count=cpu_count)

# Optionally get Disk information
if config.track_disk_usage:
try:
disk_info = psutil.disk_usage(config.disk_path)
except Exception:
pass
else:
metrics.update(disk_used=disk_info.used, disk_total=disk_info.total)
limits["disk"] = {"disk": disk_info.total}
if config.disk_warning_threshold != 0:
limits["disk"]["warn"] = (disk_info.total - disk_info.used) < (
disk_info.total * config.disk_warning_threshold
)

self.write(json.dumps(metrics))

@run_on_executor
Expand Down
49 changes: 48 additions & 1 deletion jupyter_resource_usage/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from traitlets import Int
from traitlets import List
from traitlets import TraitType
from traitlets import Unicode
from traitlets import Union
from traitlets.config import Configurable

Expand All @@ -27,7 +28,7 @@ def validate(self, obj, value):
keys = list(value.keys())
if "name" in keys:
keys.remove("name")
if all(key in ["kwargs", "attribute"] for key in keys):
if all(key in ["args", "kwargs", "attribute"] for key in keys):
return value
self.error(obj, value)

Expand All @@ -37,6 +38,15 @@ class ResourceUseDisplay(Configurable):
Holds server-side configuration for jupyter-resource-usage
"""

# Needs to be defined early, so the metrics can use it.
disk_path = Union(
trait_types=[Unicode(), Callable()],
default_value="/home/joyvan",
help="""
A path in the partition to be reported on.
""",
).tag(config=True)

process_memory_metrics = List(
trait=PSUtilMetric(),
default_value=[{"name": "memory_info", "attribute": "rss"}],
Expand All @@ -56,6 +66,19 @@ class ResourceUseDisplay(Configurable):
trait=PSUtilMetric(), default_value=[{"name": "cpu_count"}]
)

process_disk_metrics = List(
trait=PSUtilMetric(),
default_value=[],
)

system_disk_metrics = List(
trait=PSUtilMetric(),
default_value=[
{"name": "disk_usage", "args": [disk_path], "attribute": "total"},
{"name": "disk_usage", "args": [disk_path], "attribute": "used"},
],
)

mem_warning_threshold = Float(
default_value=0.1,
help="""
Expand Down Expand Up @@ -123,6 +146,30 @@ def _mem_limit_default(self):
def _cpu_limit_default(self):
return float(os.environ.get("CPU_LIMIT", 0))

track_disk_usage = Bool(
default_value=False,
help="""
Set to True in order to enable reporting of disk usage statistics.
""",
).tag(config=True)

@default("disk_path")
def _disk_path_default(self):
return str(os.environ.get("HOME", "/home/joyvan"))

disk_warning_threshold = Float(
default_value=0.1,
help="""
Warn user with flashing lights when disk usage is within this fraction
total space.

For example, if total size is 10G, `disk_warning_threshold` is 0.1,
we will start warning the user when they use (10 - (10 * 0.1)) G.

Set to 0 to disable warning.
""",
).tag(config=True)

enable_prometheus_metrics = Bool(
default_value=True,
help="""
Expand Down
29 changes: 20 additions & 9 deletions jupyter_resource_usage/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ def __init__(self, server_app: ServerApp):
]
self.server_app = server_app

def get_process_metric_value(self, process, name, kwargs, attribute=None):
def get_process_metric_value(self, process, name, args, kwargs, attribute=None):
try:
# psutil.Process methods will either return...
metric_value = getattr(process, name)(**kwargs)
metric_value = getattr(process, name)(*args, **kwargs)
if attribute is not None: # ... a named tuple
return getattr(metric_value, attribute)
else: # ... or a number
Expand All @@ -26,25 +26,28 @@ def get_process_metric_value(self, process, name, kwargs, attribute=None):
except BaseException:
return 0

def process_metric(self, name, kwargs={}, attribute=None):
def process_metric(self, name, args=[], kwargs={}, attribute=None):
if psutil is None:
return None
else:
current_process = psutil.Process()
all_processes = [current_process] + current_process.children(recursive=True)

process_metric_value = lambda process: self.get_process_metric_value(
process, name, kwargs, attribute
process, name, args, kwargs, attribute
)

return sum([process_metric_value(process) for process in all_processes])

def system_metric(self, name, kwargs={}, attribute=None):
def system_metric(self, name, args=[], kwargs={}, attribute=None):
if psutil is None:
return None
else:
# psutil functions will either return...
metric_value = getattr(psutil, name)(**kwargs)
# psutil functions will either raise an error, or return...
try:
metric_value = getattr(psutil, name)(*args, **kwargs)
except:
return None
if attribute is not None: # ... a named tuple
return getattr(metric_value, attribute)
else: # ... or a number
Expand All @@ -63,8 +66,11 @@ def get_metric_values(self, metrics, metric_type):
return metric_values

def metrics(self, process_metrics, system_metrics):
metric_values = self.get_metric_values(process_metrics, "process")
metric_values.update(self.get_metric_values(system_metrics, "system"))
metric_values = {}
if process_metrics:
metric_values.update(self.get_metric_values(process_metrics, "process"))
if system_metrics:
metric_values.update(self.get_metric_values(system_metrics, "system"))

if any(value is None for value in metric_values.values()):
return None
Expand All @@ -80,3 +86,8 @@ def cpu_metrics(self):
return self.metrics(
self.config.process_cpu_metrics, self.config.system_cpu_metrics
)

def disk_metrics(self):
return self.metrics(
self.config.process_disk_metrics, self.config.system_disk_metrics
)
14 changes: 13 additions & 1 deletion jupyter_resource_usage/prometheus.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,14 @@ def __init__(self, metricsloader: PSUtilMetricsLoader):
self.config = metricsloader.config
self.session_manager = metricsloader.server_app.session_manager

gauge_names = ["total_memory", "max_memory", "total_cpu", "max_cpu"]
gauge_names = [
"total_memory",
"max_memory",
"total_cpu",
"max_cpu",
"max_disk",
"current_disk",
]
for name in gauge_names:
phrase = name + "_usage"
gauge = Gauge(phrase, "counter for " + phrase.replace("_", " "), [])
Expand All @@ -34,6 +41,11 @@ async def __call__(self, *args, **kwargs):
if cpu_metric_values is not None:
self.TOTAL_CPU_USAGE.set(cpu_metric_values["cpu_percent"])
self.MAX_CPU_USAGE.set(self.apply_cpu_limit(cpu_metric_values))
if self.config.track_disk_usage:
disk_metric_values = self.metricsloader.disk_metrics()
if disk_metric_values is not None:
self.CURRENT_DISK_USAGE.set(disk_metric_values["disk_usage_used"])
self.MAX_DISK_USAGE.set(disk_metric_values["disk_usage_total"])

def apply_memory_limit(self, memory_metric_values) -> Optional[int]:
if memory_metric_values is None:
Expand Down
Loading
Loading