Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cgroups v2 pkg #23

Merged
merged 7 commits into from
Jan 1, 2024
Merged

Use cgroups v2 pkg #23

merged 7 commits into from
Jan 1, 2024

Commits on Dec 30, 2023

  1. feat: Rework on cgroups v2

    * Use containerd cgroups pkg to read cgroups stats
    
    * Add slurm_job to metric name to be consistent across collectors
    
    * Add cpu and memory PSI metrics
    
    * Walk through pids of cgroup only to get job details
    
    Signed-off-by: Mahendra Paipuri <[email protected]>
    mahendrapaipuri committed Dec 30, 2023
    Configuration menu
    Copy the full SHA
    71d8fce View commit details
    Browse the repository at this point in the history
  2. test: Add a new e2e test scenario

    * Test when slurm prolog files are not present
    
    * This tests if we are able to read procfs correctly and get env vars
    
    * Update test fixtures
    
    Signed-off-by: Mahendra Paipuri <[email protected]>
    mahendrapaipuri committed Dec 30, 2023
    Configuration menu
    Copy the full SHA
    a4e9b37 View commit details
    Browse the repository at this point in the history
  3. test: Update tests and fixtures

    Signed-off-by: Mahendra Paipuri <[email protected]>
    mahendrapaipuri committed Dec 30, 2023
    Configuration menu
    Copy the full SHA
    547c85a View commit details
    Browse the repository at this point in the history
  4. style: gofmt'ed'!!

    Signed-off-by: Mahendra Paipuri <[email protected]>
    mahendrapaipuri committed Dec 30, 2023
    Configuration menu
    Copy the full SHA
    f9d8992 View commit details
    Browse the repository at this point in the history

Commits on Jan 1, 2024

  1. feat: Rework slurm collector

    * Merge GPU jobID map with slurm collector. This is more logical organization instead of having a separate collector
    
    * Dont report swap and PSI metrics by default. They can be enabled  using CLI flag
    
    * Add a hidden flag to force cgroups version for testing
    
    * Refacorting of certain receivers to be more clean
    
    * Add more unit tests to cover more scenarios
    
    * Add test fixtures to be able to unit test
    
    * Add more e2e test scenarios
    
    Signed-off-by: Mahendra Paipuri <[email protected]>
    mahendrapaipuri committed Jan 1, 2024
    Configuration menu
    Copy the full SHA
    3cc5c83 View commit details
    Browse the repository at this point in the history
  2. docs: Update README and systemd files

    Signed-off-by: Mahendra Paipuri <[email protected]>
    mahendrapaipuri committed Jan 1, 2024
    Configuration menu
    Copy the full SHA
    684fa4f View commit details
    Browse the repository at this point in the history
  3. chore: Use int indices for devs map

    * Iteration over map is undefined in go and not reproducible
    
    * To ensure we always have same behaviour we use int as map index and iterate over range
    
    * This is done to avoid unit test failures as order in slice gpuOrdinals is important in cmp
    
    Signed-off-by: Mahendra Paipuri <[email protected]>
    mahendrapaipuri committed Jan 1, 2024
    Configuration menu
    Copy the full SHA
    7e5ab16 View commit details
    Browse the repository at this point in the history