Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add disk monitoring #232

Closed
wants to merge 10 commits into from
Closed
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ __pycache__/

# Distribution / packaging
.Python
.direnv
.envrc
env/
build/
develop-eggs/
Expand Down
10 changes: 10 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,16 @@ JupyterLab v3.0.0
jupyter-resource-usage v0.1.0 enabled OK
```

There is an example config file included.

## Which code creates what content

The stats are created by the server-side code in `jupyter_resource_usage`,

For the jupyterlab 4 / notebook 7 UIs, the code in `packages/labextension` creates and writes the content for both the statusbar and the topbar.

The framework the topbar is defined in the schema, whilst the contents of the statusbar is driven purely by the labextension code.... and labels are defined by their approrpiate `xxxView.tsx` file

## pre-commit

`jupyter-resource-usage` has adopted automatic code formatting so you shouldn't need to worry too much about your code style.
Expand Down
21 changes: 20 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,23 @@ memory:

![Screenshot with CPU and memory](./doc/statusbar-cpu.png)

### Disk [partition] Usage

`jupyter-resource-usage` can also track Disk usage [of a defined parition] and report the `total` and `used` values as part of the `/api/metrics/v1` response.

You enable tracking by setting the `track_disk_usage` trait (disabled by default):

```python
c = get_config()
c.ResourceUseDisplay.track_disk_usage = True
```

The values are from the partition containing the folder in the trait `disk_path` (which defaults to `/home/joyvan`)

Mirroring CPU and Memory, the trait `disk_warning_threshold` signifies when to flag a usage warning, and like the others, it defaults to `0.1` (10% remailing)

![Screenshot with Disk, CPU, and memory](./doc/statusbar_disk.png)

### Disable Prometheus Metrics

There is a [known bug](https://github.com/jupyter-server/jupyter-resource-usage/issues/123) with Prometheus metrics which
Expand All @@ -157,9 +174,11 @@ render the alternative frontend in the topbar.
Users can change the label and refresh rate for the alternative frontend using settings
editor.

(The vertical bars are included by default, to help separate the three indicators.)

## Resources Displayed

Currently the server extension only reports memory usage and CPU usage. Other metrics will be added in the future as needed.
Currently the server extension only reports disk usage, memory usage and CPU usage. Other metrics will be added in the future as needed.

Memory usage will show the PSS whenever possible (Linux only feature), and default to RSS otherwise.

Expand Down
Binary file modified doc/settings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/statusbar_disk.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions example_jupyter_jupyterlab_server_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
c = get_config() # noqa
# resource monitor config
c.ResourceUseDisplay.track_cpu_percent = True
c.ResourceUseDisplay.cpu_limit = 4

c.ResourceUseDisplay.mem_limit = 8589934592 # 8GB

c.ResourceUseDisplay.track_disk_usage = True
c.ResourceUseDisplay.disk_path = "/home"
14 changes: 14 additions & 0 deletions jupyter_resource_usage/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,20 @@ async def get(self):

metrics.update(cpu_percent=cpu_percent, cpu_count=cpu_count)

# Optionally get Disk information
if config.track_disk_usage:
try:
disk_info = psutil.disk_usage(config.disk_path)
except:
pass
else:
metrics.update(disk_used=disk_info.used, disk_total=disk_info.total)
limits["disk"] = {"disk": disk_info.total}
if config.disk_warning_threshold != 0:
limits["disk"]["warn"] = (disk_info.total - disk_info.used) < (
disk_info.total * config.cpu_warning_threshold
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be

disk_info.total * config.disk_warning_threshold

??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it should!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, the warning state calculated is never handled. It looks like the current client-side code handles warning state for memory only. I'm playing with this right now to show the disk warning!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than happy for you to contribute....
Question: does the CPU have a parallel warning state that should be handled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! I'm more than happy to contribute to your branch and close PR #233, which I've pulled in your changes to and did the React work for the warnings. I was in a flow state this AM and wasn't sure how active this PR was going to be, so I threw those changes together.

Check it out and let me know what you think!

)

self.write(json.dumps(metrics))

@run_on_executor
Expand Down
49 changes: 48 additions & 1 deletion jupyter_resource_usage/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from traitlets import Int
from traitlets import List
from traitlets import TraitType
from traitlets import Unicode
from traitlets import Union
from traitlets.config import Configurable

Expand All @@ -27,7 +28,7 @@ def validate(self, obj, value):
keys = list(value.keys())
if "name" in keys:
keys.remove("name")
if all(key in ["kwargs", "attribute"] for key in keys):
if all(key in ["args", "kwargs", "attribute"] for key in keys):
return value
self.error(obj, value)

Expand All @@ -37,6 +38,15 @@ class ResourceUseDisplay(Configurable):
Holds server-side configuration for jupyter-resource-usage
"""

# Needs to be defined early, so the metrics can use it.
disk_path = Union(
trait_types=[Unicode(), Callable()],
default_value="/home/joyvan",
help="""
A path in the partition to be reported on.
""",
).tag(config=True)

process_memory_metrics = List(
trait=PSUtilMetric(),
default_value=[{"name": "memory_info", "attribute": "rss"}],
Expand All @@ -56,6 +66,19 @@ class ResourceUseDisplay(Configurable):
trait=PSUtilMetric(), default_value=[{"name": "cpu_count"}]
)

process_disk_metrics = List(
trait=PSUtilMetric(),
default_value=[],
)

system_disk_metrics = List(
trait=PSUtilMetric(),
default_value=[
{"name": "disk_usage", "args": [disk_path], "attribute": "total"},
{"name": "disk_usage", "args": [disk_path], "attribute": "used"},
],
)

mem_warning_threshold = Float(
default_value=0.1,
help="""
Expand Down Expand Up @@ -123,6 +146,30 @@ def _mem_limit_default(self):
def _cpu_limit_default(self):
return float(os.environ.get("CPU_LIMIT", 0))

track_disk_usage = Bool(
default_value=False,
help="""
Set to True in order to enable reporting of disk usage statistics.
""",
).tag(config=True)

@default("disk_path")
def _disk_path_default(self):
return str(os.environ.get("HOME", "/home/joyvan"))

disk_warning_threshold = Float(
default_value=0.1,
help="""
Warn user with flashing lights when disk usage is within this fraction
total space.

For example, if total size is 10G, `disk_warning_threshold` is 0.1,
we will start warning the user when they use (10 - (10 * 0.1)) G.

Set to 0 to disable warning.
""",
).tag(config=True)

enable_prometheus_metrics = Bool(
default_value=True,
help="""
Expand Down
29 changes: 20 additions & 9 deletions jupyter_resource_usage/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ def __init__(self, server_app: ServerApp):
]
self.server_app = server_app

def get_process_metric_value(self, process, name, kwargs, attribute=None):
def get_process_metric_value(self, process, name, args, kwargs, attribute=None):
try:
# psutil.Process methods will either return...
metric_value = getattr(process, name)(**kwargs)
metric_value = getattr(process, name)(*args, **kwargs)
if attribute is not None: # ... a named tuple
return getattr(metric_value, attribute)
else: # ... or a number
Expand All @@ -26,25 +26,28 @@ def get_process_metric_value(self, process, name, kwargs, attribute=None):
except BaseException:
return 0

def process_metric(self, name, kwargs={}, attribute=None):
def process_metric(self, name, args=[], kwargs={}, attribute=None):
if psutil is None:
return None
else:
current_process = psutil.Process()
all_processes = [current_process] + current_process.children(recursive=True)

process_metric_value = lambda process: self.get_process_metric_value(
process, name, kwargs, attribute
process, name, args, kwargs, attribute
)

return sum([process_metric_value(process) for process in all_processes])

def system_metric(self, name, kwargs={}, attribute=None):
def system_metric(self, name, args=[], kwargs={}, attribute=None):
if psutil is None:
return None
else:
# psutil functions will either return...
metric_value = getattr(psutil, name)(**kwargs)
# psutil functions will either raise an error, or return...
try:
metric_value = getattr(psutil, name)(*args, **kwargs)
except:
return None
if attribute is not None: # ... a named tuple
return getattr(metric_value, attribute)
else: # ... or a number
Expand All @@ -63,8 +66,11 @@ def get_metric_values(self, metrics, metric_type):
return metric_values

def metrics(self, process_metrics, system_metrics):
metric_values = self.get_metric_values(process_metrics, "process")
metric_values.update(self.get_metric_values(system_metrics, "system"))
metric_values = {}
if process_metrics:
metric_values.update(self.get_metric_values(process_metrics, "process"))
if system_metrics:
metric_values.update(self.get_metric_values(system_metrics, "system"))

if any(value is None for value in metric_values.values()):
return None
Expand All @@ -80,3 +86,8 @@ def cpu_metrics(self):
return self.metrics(
self.config.process_cpu_metrics, self.config.system_cpu_metrics
)

def disk_metrics(self):
return self.metrics(
self.config.process_disk_metrics, self.config.system_disk_metrics
)
14 changes: 13 additions & 1 deletion jupyter_resource_usage/prometheus.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,14 @@ def __init__(self, metricsloader: PSUtilMetricsLoader):
self.config = metricsloader.config
self.session_manager = metricsloader.server_app.session_manager

gauge_names = ["total_memory", "max_memory", "total_cpu", "max_cpu"]
gauge_names = [
"total_memory",
"max_memory",
"total_cpu",
"max_cpu",
"max_disk",
"current_disk",
]
for name in gauge_names:
phrase = name + "_usage"
gauge = Gauge(phrase, "counter for " + phrase.replace("_", " "), [])
Expand All @@ -34,6 +41,11 @@ async def __call__(self, *args, **kwargs):
if cpu_metric_values is not None:
self.TOTAL_CPU_USAGE.set(cpu_metric_values["cpu_percent"])
self.MAX_CPU_USAGE.set(self.apply_cpu_limit(cpu_metric_values))
if self.config.track_disk_usage:
disk_metric_values = self.metricsloader.disk_metrics()
if disk_metric_values is not None:
self.CURRENT_DISK_USAGE.set(disk_metric_values["disk_usage_used"])
self.MAX_DISK_USAGE.set(disk_metric_values["disk_usage_total"])

def apply_memory_limit(self, memory_metric_values) -> Optional[int]:
if memory_metric_values is None:
Expand Down
Loading
Loading