From caf5ec9bddb5e6fa4da77e3619a30f6789922316 Mon Sep 17 00:00:00 2001 From: Mahendra Paipuri Date: Sun, 27 Oct 2024 10:05:44 +0100 Subject: [PATCH] refactor: Major refactor to improve performance of exporter * Instead of each (sub-)collector discovering necessary resources (cgroups, procs, etc) individually, centralise the operation using cgroup sub-collector. * cgroup sub-collector will discover all the active cgroups, their children and the processes contained within cgroups and pass this information to downstream sub-collectors. This ensures that we can use the same information across different sub-collectors without needing to re-collect the same information again. * IMPORTANT thing that we learned is that reading files in /proc is very expensive as it involves kernel taking a lot of spin locks. Before we are fetching the relevant processes for perf collector by tranversing the /proc file system, reading cgroup for each process and building this info. This turned out to be very expensive and hence we collect all the info now from cgroups using cgroup sub-collector once and pass it to downstream components. * Remove support for getting GPU ordinals using a file created by SLURM prolog. The exporter needs quite few privs now and there is no point in supporting this functionality. It is easier for operators to use the method of getting ordinals from env vars as it involves less configuration * Correct prometheus and Grafana config docs. * Remove SLURM prolog/epilog config files and systemd service file that will no longer be supported. Signed-off-by: Mahendra Paipuri --- etc/slurm/README.md | 16 -- etc/slurm/epilog.d/gpujobmap.sh | 11 - etc/slurm/prolog.d/gpujobmap.sh | 15 -- init/systemd/ceems_exporter_no_privs.service | 28 --- pkg/collector/alloy_targets.go | 107 +++++----- pkg/collector/alloy_targets_test.go | 110 +++++----- pkg/collector/cgroup.go | 187 +++++++++++++++-- pkg/collector/cgroup_test.go | 42 ++-- pkg/collector/cli_test.go | 12 +- pkg/collector/ebpf.go | 91 ++------ pkg/collector/ebpf_test.go | 35 ++-- pkg/collector/helper.go | 115 +++------- pkg/collector/libvirt.go | 114 ++++------ pkg/collector/libvirt_test.go | 14 +- pkg/collector/perf.go | 94 ++++----- pkg/collector/perf_test.go | 61 +++--- pkg/collector/rdma.go | 38 +--- pkg/collector/rdma_test.go | 65 +++--- pkg/collector/slurm.go | 198 ++++-------------- pkg/collector/slurm_test.go | 185 ++++++++-------- ...test-discoverer-cgroupsv1-slurm-output.txt | 2 +- ...test-discoverer-cgroupsv2-slurm-output.txt | 2 +- ...e-test-cgroupsv2-nvidia-gpu-reordering.txt | 20 +- ...-test-cgroupsv2-nvidia-ipmiutil-output.txt | 20 +- pkg/collector/testdata/proc.ttar | 6 +- pkg/collector/testdata/sys.ttar | 75 +++---- scripts/e2e-test.sh | 6 - website/docs/configuration/ceems-exporter.md | 25 +-- website/docs/configuration/grafana.md | 24 ++- website/docs/configuration/prometheus.md | 14 +- 30 files changed, 763 insertions(+), 969 deletions(-) delete mode 100644 etc/slurm/README.md delete mode 100755 etc/slurm/epilog.d/gpujobmap.sh delete mode 100755 etc/slurm/prolog.d/gpujobmap.sh delete mode 100644 init/systemd/ceems_exporter_no_privs.service diff --git a/etc/slurm/README.md b/etc/slurm/README.md deleted file mode 100644 index 92782df9..00000000 --- a/etc/slurm/README.md +++ /dev/null @@ -1,16 +0,0 @@ -# SLURM epilog and prolog scripts - -CEEMS exporter needs to perform few privileged actions to collect certain information of -compute units. An example [systemd service file](https://github.com/mahendrapaipuri/ceems/blob/main/build/package/ceems_exporter/ceems_exporter.service) -provided in the repo shows the linux capabilities necessary for these privileged actions. - -If the operators would like to avoid privileges on CEEMS exporter and run it fully in -userland an alternative approach, in SLURM context, is to use Epilog and Prolog scripts -to write the necessary job information to a file that is readable by CEEMS exporter. -This directory provides those scripts that should be used with SLURM. - -An example [systemd service file](https://github.com/mahendrapaipuri/ceems/blob/main/init/systemd/ceems_exporter_no_privs.service) -is also provided in the repo that can be used along with these prolog and epilog scripts. - -Even with such prolog and epilog scripts, operators should grant the CEEMS exporter -process additional privileges for collectors like `ipmi_dcmi`, `ebpf`, _etc_. diff --git a/etc/slurm/epilog.d/gpujobmap.sh b/etc/slurm/epilog.d/gpujobmap.sh deleted file mode 100755 index ad01f5f2..00000000 --- a/etc/slurm/epilog.d/gpujobmap.sh +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/bash - -# Need to use this path in --collector.nvidia.gpu.job.map.path flag for ceems_exporter -DEST=/run/gpujobmap -[ -e $DEST ] || mkdir -m 755 $DEST - -# Ensure to remove the file with jobid once the job finishes -for i in ${GPU_DEVICE_ORDINAL//,/ } ${CUDA_VISIBLE_DEVICES//,/ }; do - rm -rf $DEST/$i -done -exit 0 diff --git a/etc/slurm/prolog.d/gpujobmap.sh b/etc/slurm/prolog.d/gpujobmap.sh deleted file mode 100755 index 6f5daed2..00000000 --- a/etc/slurm/prolog.d/gpujobmap.sh +++ /dev/null @@ -1,15 +0,0 @@ -#!/bin/bash - -# Need to use this path in --collector.slurm.gpu-job-map-path flag for ceems_exporter -DEST=/run/gpujobmap -[ -e $DEST ] || mkdir -m 755 $DEST - -# CUDA_VISIBLE_DEVICES in prolog will be "actual" GPU indices and once job starts -# CUDA will reset the indices to always start from 0. Thus inside a job, CUDA_VISIBLE_DEVICES -# will always start with 0 but during prolog script execution it can be any ordinal index -# based on how SLURM allocated the GPUs -# Ref: https://slurm.schedmd.com/prolog_epilog.html -for i in ${GPU_DEVICE_ORDINAL//,/ } ${CUDA_VISIBLE_DEVICES//,/ }; do - echo $SLURM_JOB_ID > $DEST/$i -done -exit 0 diff --git a/init/systemd/ceems_exporter_no_privs.service b/init/systemd/ceems_exporter_no_privs.service deleted file mode 100644 index e691aeb7..00000000 --- a/init/systemd/ceems_exporter_no_privs.service +++ /dev/null @@ -1,28 +0,0 @@ -[Unit] -Description=CEEMS Exporter Using Prolog and Epilog Scripts Without Additional Privileges -After=network-online.target - -[Service] -Type=simple -User=ceems -Group=ceems -ExecStart=/usr/local/bin/ceems_exporter \ - --collector.slurm \ - --collector.slurm.gpu.job.map.path="/run/gpujobmap" \ - --collector.ipmi.dcmi.cmd="sudo /usr/sbin/ipmi-dcmi --get-system-power-statistics" \ - --log.level=debug - -SyslogIdentifier=ceems_exporter -Restart=always -RestartSec=1 -StartLimitInterval=0 - -NoNewPrivileges=yes - -ProtectSystem=strict -ProtectControlGroups=true -ProtectKernelModules=true -ProtectKernelTunables=yes - -[Install] -WantedBy=multi-user.target diff --git a/pkg/collector/alloy_targets.go b/pkg/collector/alloy_targets.go index ebef0d2d..1dde5459 100644 --- a/pkg/collector/alloy_targets.go +++ b/pkg/collector/alloy_targets.go @@ -4,15 +4,14 @@ import ( "encoding/json" "fmt" "net/http" + "os" "strconv" - "strings" "time" "github.com/go-kit/log" "github.com/go-kit/log/level" "github.com/mahendrapaipuri/ceems/internal/security" "github.com/prometheus/client_golang/prometheus/promhttp" - "github.com/prometheus/procfs" ) // CLI opts. @@ -25,8 +24,14 @@ var ( "discoverer.alloy-targets.env-var", "Enable continuous profiling by Grafana Alloy only on the processes having any of these environment variables.", ).Strings() + alloySelfTarget = CEEMSExporterApp.Flag( + "discoverer.alloy-targets.self-profiler", + "Enable continuous profiling by Grafana Alloy on current process (default: false).", + ).Default("false").Bool() ) +var selfTargetID = "__internal_ceems_exporter" + const ( contentTypeHeader = "Content-Type" contentType = "application/json" @@ -38,12 +43,12 @@ const ( // Security context names. const ( - alloyTargetDiscovererCtx = "alloy_targets_discoverer" + alloyTargetFilterCtx = "alloy_targets_filter" ) // alloyTargetDiscovererSecurityCtxData contains the input/output data for // discoverer function to execute inside security context. -type alloyTargetDiscovererSecurityCtxData = perfDiscovererSecurityCtxData +type alloyTargetFilterSecurityCtxData = perfProcFilterSecurityCtxData type Target struct { Targets []string `json:"targets"` @@ -57,7 +62,6 @@ type alloyTargetOpts struct { type CEEMSAlloyTargetDiscoverer struct { logger log.Logger cgroupManager *cgroupManager - fs procfs.FS opts alloyTargetOpts enabled bool securityContexts map[string]*security.SecurityContext @@ -77,16 +81,8 @@ func NewAlloyTargetDiscoverer(logger log.Logger) (*CEEMSAlloyTargetDiscoverer, e targetEnvVars: *alloyTargetEnvVars, } - // Instantiate a new Proc FS - fs, err := procfs.NewFS(*procfsPath) - if err != nil { - level.Error(logger).Log("msg", "Unable to open procfs", "path", *procfsPath, "err", err) - - return nil, err - } - // Get SLURM's cgroup details - cgroupManager, err := NewCgroupManager(*cgManager) + cgroupManager, err := NewCgroupManager(*cgManager, logger) if err != nil { level.Info(logger).Log("msg", "Failed to create cgroup manager", "err", err) @@ -97,7 +93,6 @@ func NewAlloyTargetDiscoverer(logger log.Logger) (*CEEMSAlloyTargetDiscoverer, e discoverer := &CEEMSAlloyTargetDiscoverer{ logger: logger, - fs: fs, cgroupManager: cgroupManager, opts: opts, enabled: true, @@ -113,10 +108,10 @@ func NewAlloyTargetDiscoverer(logger log.Logger) (*CEEMSAlloyTargetDiscoverer, e capabilities := []string{"cap_sys_ptrace", "cap_dac_read_search"} auxCaps := setupCollectorCaps(logger, alloyTargetDiscovererSubSystem, capabilities) - discoverer.securityContexts[alloyTargetDiscovererCtx], err = security.NewSecurityContext( - alloyTargetDiscovererCtx, + discoverer.securityContexts[alloyTargetFilterCtx], err = security.NewSecurityContext( + alloyTargetFilterCtx, auxCaps, - targetDiscoverer, + filterTargets, logger, ) if err != nil { @@ -153,33 +148,35 @@ func (d *CEEMSAlloyTargetDiscoverer) discover() ([]Target, error) { return []Target{}, nil } + // Get active cgroups + cgroups, err := d.cgroupManager.discover() + if err != nil { + return nil, fmt.Errorf("failed to discover cgroups: %w", err) + } + // Read discovered cgroups into data pointer - dataPtr := &alloyTargetDiscovererSecurityCtxData{ - procfs: d.fs, - cgroupManager: d.cgroupManager, + dataPtr := &alloyTargetFilterSecurityCtxData{ + cgroups: cgroups, targetEnvVars: d.opts.targetEnvVars, + ignoreProc: d.cgroupManager.ignoreProc, } // If there is a need to read processes' environ, use security context // else execute function natively if len(d.opts.targetEnvVars) > 0 { - if securityCtx, ok := d.securityContexts[alloyTargetDiscovererCtx]; ok { + if securityCtx, ok := d.securityContexts[alloyTargetFilterCtx]; ok { if err := securityCtx.Exec(dataPtr); err != nil { return nil, err } } else { return nil, security.ErrNoSecurityCtx } - } else { - if err := targetDiscoverer(dataPtr); err != nil { - return nil, err - } } - if len(dataPtr.cgroups) > 0 { - level.Debug(d.logger).Log("msg", "Discovered targets for Grafana Alloy") - } else { + if len(dataPtr.cgroups) == 0 { level.Debug(d.logger).Log("msg", "No targets found for Grafana Alloy") + + return []Target{}, nil } // Make targets from cgrpoups @@ -187,24 +184,13 @@ func (d *CEEMSAlloyTargetDiscoverer) discover() ([]Target, error) { for _, cgroup := range dataPtr.cgroups { for _, proc := range cgroup.procs { - exe, _ := proc.Executable() - comm, _ := proc.CmdLine() - - var realUID, effecUID uint64 - if status, err := proc.NewStatus(); err == nil { - realUID = status.UIDs[0] - effecUID = status.UIDs[1] - } - + // Reading files in /proc is expensive. So, return minimal + // info needed for target target := Target{ Targets: []string{cgroup.id}, Labels: map[string]string{ - "__process_pid__": strconv.FormatInt(int64(proc.PID), 10), - "__process_exe": exe, - "__process_commandline": strings.Join(comm, " "), - "__process_real_uid": strconv.FormatUint(realUID, 10), - "__process_effective_uid": strconv.FormatUint(effecUID, 10), - "service_name": cgroup.id, + "__process_pid__": strconv.FormatInt(int64(proc.PID), 10), + "service_name": cgroup.uuid, }, } @@ -212,31 +198,32 @@ func (d *CEEMSAlloyTargetDiscoverer) discover() ([]Target, error) { } } + // If self profiler is enabled add current process to targets + if *alloySelfTarget { + targets = append(targets, Target{ + Targets: []string{selfTargetID}, + Labels: map[string]string{ + "__process_pid__": strconv.FormatInt(int64(os.Getpid()), 10), + "service_name": selfTargetID, + }, + }) + } + return targets, nil } -// discoverer returns a map of discovered cgroup ID to procs by looking at each process -// in proc FS. Walking through cgroup fs is not really an option here as cgroups v1 -// wont have all PIDs of cgroup if the PID controller is not turned on. -// The current implementation should work for both cgroups v1 and v2. -// This function might be executed in a security context if targetEnvVars is not -// empty. -func targetDiscoverer(data interface{}) error { +// filterTargets filters the targets based on target env vars and return filtered targets. +func filterTargets(data interface{}) error { // Assert data is of alloyTargetDiscovererSecurityCtxData - var d *alloyTargetDiscovererSecurityCtxData + var d *alloyTargetFilterSecurityCtxData var ok bool - if d, ok = data.(*alloyTargetDiscovererSecurityCtxData); !ok { + if d, ok = data.(*alloyTargetFilterSecurityCtxData); !ok { return security.ErrSecurityCtxDataAssertion } - cgroups, err := getCgroups(d.procfs, d.cgroupManager.idRegex, d.targetEnvVars, d.cgroupManager.procFilter) - if err != nil { - return err - } - - // Read cgroups proc map into d - d.cgroups = cgroups + // Read filtered cgroups into d + d.cgroups = cgroupProcFilterer(d.cgroups, d.targetEnvVars, d.ignoreProc) return nil } diff --git a/pkg/collector/alloy_targets_test.go b/pkg/collector/alloy_targets_test.go index 826cae9d..dcf3df4f 100644 --- a/pkg/collector/alloy_targets_test.go +++ b/pkg/collector/alloy_targets_test.go @@ -1,6 +1,8 @@ package collector import ( + "os" + "strconv" "testing" "github.com/go-kit/log" @@ -8,52 +10,38 @@ import ( "github.com/stretchr/testify/require" ) -var expectedTargets = []Target{ - { - Targets: []string{"1320003"}, - Labels: map[string]string{ - "__process_commandline": "/gpfslocalsup/spack_soft/gromacs/2022.2/gcc-8.4.1-kblhs7pjrcqlgv675gejjjy7n3h6wz2n/bin/gmx_mpi mdrun -ntomp 10 -v -deffnm run10 -multidir 1/ 2/ 3/ 4/", - "__process_effective_uid": "1000", - "__process_exe": "/usr/bin/vim", - "__process_pid__": "46236", - "__process_real_uid": "1000", - "service_name": "1320003", - }, - }, - { - Targets: []string{"1320003"}, - Labels: map[string]string{ - "__process_commandline": "/gpfslocalsup/spack_soft/gromacs/2022.2/gcc-8.4.1-kblhs7pjrcqlgv675gejjjy7n3h6wz2n/bin/gmx_mpi mdrun -ntomp 10 -v -deffnm run10 -multidir 1/ 2/ 3/ 4/", - "__process_effective_uid": "1000", - "__process_exe": "/usr/bin/vim", - "__process_pid__": "46235", - "__process_real_uid": "1000", - "service_name": "1320003", - }, - }, - { - Targets: []string{"4824887"}, - Labels: map[string]string{ - "__process_commandline": "/gpfslocalsup/spack_soft/gromacs/2022.2/gcc-8.4.1-kblhs7pjrcqlgv675gejjjy7n3h6wz2n/bin/gmx_mpi mdrun -ntomp 10 -v -deffnm run10 -multidir 1/ 2/ 3/ 4/", - "__process_effective_uid": "1000", - "__process_exe": "/usr/bin/vim", - "__process_pid__": "46281", - "__process_real_uid": "1000", - "service_name": "4824887", - }, - }, - { - Targets: []string{"4824887"}, - Labels: map[string]string{ - "__process_commandline": "/gpfslocalsup/spack_soft/gromacs/2022.2/gcc-8.4.1-kblhs7pjrcqlgv675gejjjy7n3h6wz2n/bin/gmx_mpi mdrun -ntomp 10 -v -deffnm run10 -multidir 1/ 2/ 3/ 4/", - "__process_effective_uid": "1000", - "__process_exe": "/usr/bin/vim", - "__process_pid__": "46231", - "__process_real_uid": "1000", - "service_name": "4824887", - }, - }, -} +var ( + expectedTargetsV2 = []Target{ + {Targets: []string{"1009248"}, Labels: map[string]string{"__process_pid__": "46231", "service_name": "1009248"}}, + {Targets: []string{"1009248"}, Labels: map[string]string{"__process_pid__": "46281", "service_name": "1009248"}}, + {Targets: []string{"1009248"}, Labels: map[string]string{"__process_pid__": "3346567", "service_name": "1009248"}}, + {Targets: []string{"1009248"}, Labels: map[string]string{"__process_pid__": "3346596", "service_name": "1009248"}}, + {Targets: []string{"1009248"}, Labels: map[string]string{"__process_pid__": "3346674", "service_name": "1009248"}}, + {Targets: []string{"1009249"}, Labels: map[string]string{"__process_pid__": "46235", "service_name": "1009249"}}, + {Targets: []string{"1009249"}, Labels: map[string]string{"__process_pid__": "46236", "service_name": "1009249"}}, + {Targets: []string{"1009249"}, Labels: map[string]string{"__process_pid__": "3346567", "service_name": "1009249"}}, + {Targets: []string{"1009249"}, Labels: map[string]string{"__process_pid__": "46233", "service_name": "1009249"}}, + {Targets: []string{"1009250"}, Labels: map[string]string{"__process_pid__": "26242", "service_name": "1009250"}}, + {Targets: []string{"1009250"}, Labels: map[string]string{"__process_pid__": "46233", "service_name": "1009250"}}, + {Targets: []string{"1009250"}, Labels: map[string]string{"__process_pid__": "3346567", "service_name": "1009250"}}, + {Targets: []string{"1009250"}, Labels: map[string]string{"__process_pid__": "3346596", "service_name": "1009250"}}, + {Targets: []string{"1009250"}, Labels: map[string]string{"__process_pid__": "3346674", "service_name": "1009250"}}, + } + expectedTargetsV2Filtered = []Target{ + {Targets: []string{"1009248"}, Labels: map[string]string{"__process_pid__": "46231", "service_name": "1009248"}}, + {Targets: []string{"1009248"}, Labels: map[string]string{"__process_pid__": "46281", "service_name": "1009248"}}, + {Targets: []string{"1009249"}, Labels: map[string]string{"__process_pid__": "46235", "service_name": "1009249"}}, + {Targets: []string{"1009249"}, Labels: map[string]string{"__process_pid__": "46236", "service_name": "1009249"}}, + } + expectedTargetsV1 = []Target{ + {Targets: []string{"1009248"}, Labels: map[string]string{"__process_pid__": "46231", "service_name": "1009248"}}, + {Targets: []string{"1009248"}, Labels: map[string]string{"__process_pid__": "46281", "service_name": "1009248"}}, + {Targets: []string{"1009249"}, Labels: map[string]string{"__process_pid__": "46235", "service_name": "1009249"}}, + {Targets: []string{"1009249"}, Labels: map[string]string{"__process_pid__": "46236", "service_name": "1009249"}}, + {Targets: []string{"1009250"}, Labels: map[string]string{"__process_pid__": "26242", "service_name": "1009250"}}, + {Targets: []string{"1009250"}, Labels: map[string]string{"__process_pid__": "46233", "service_name": "1009250"}}, + } +) func TestAlloyDiscovererSlurmCgroupsV2(t *testing.T) { _, err := CEEMSExporterApp.Parse([]string{ @@ -69,7 +57,7 @@ func TestAlloyDiscovererSlurmCgroupsV2(t *testing.T) { targets, err := discoverer.Discover() require.NoError(t, err) - assert.ElementsMatch(t, expectedTargets, targets) + assert.ElementsMatch(t, expectedTargetsV2, targets) } func TestAlloyDiscovererSlurmCgroupsV1(t *testing.T) { @@ -86,5 +74,33 @@ func TestAlloyDiscovererSlurmCgroupsV1(t *testing.T) { targets, err := discoverer.Discover() require.NoError(t, err) + assert.ElementsMatch(t, expectedTargetsV1, targets) +} + +func TestAlloyDiscovererSlurmCgroupsV2WithEnviron(t *testing.T) { + _, err := CEEMSExporterApp.Parse([]string{ + "--path.procfs", "testdata/proc", + "--path.cgroupfs", "testdata/sys/fs/cgroup", + "--discoverer.alloy-targets.resource-manager", "slurm", + "--discoverer.alloy-targets.env-var", "ENABLE_PROFILING", + "--collector.cgroups.force-version", "v2", + "--discoverer.alloy-targets.self-profiler", + }) + require.NoError(t, err) + + discoverer, err := NewAlloyTargetDiscoverer(log.NewNopLogger()) + require.NoError(t, err) + + targets, err := discoverer.Discover() + require.NoError(t, err) + + expectedTargets := append(expectedTargetsV2Filtered, Target{ + Targets: []string{selfTargetID}, + Labels: map[string]string{ + "__process_pid__": strconv.FormatInt(int64(os.Getpid()), 10), + "service_name": selfTargetID, + }, + }) + assert.ElementsMatch(t, expectedTargets, targets) } diff --git a/pkg/collector/cgroup.go b/pkg/collector/cgroup.go index 7da36e70..017488c9 100644 --- a/pkg/collector/cgroup.go +++ b/pkg/collector/cgroup.go @@ -2,9 +2,11 @@ package collector import ( "bufio" + "bytes" "context" "errors" "fmt" + "io/fs" "math" "os" "path/filepath" @@ -19,6 +21,7 @@ import ( "github.com/go-kit/log" "github.com/go-kit/log/level" "github.com/prometheus/client_golang/prometheus" + "github.com/prometheus/procfs" "github.com/prometheus/procfs/blockdevice" ) @@ -83,8 +86,38 @@ var ( ).Hidden().Enum("v1", "v2") ) +type cgroupPath struct { + abs, rel string +} + +// String implements stringer interface of the struct. +func (c *cgroupPath) String() string { + return c.abs +} + +type cgroup struct { + id string + uuid string // uuid is the identifier known to user whereas id is identifier used by resource manager internally + procs []procfs.Proc + path cgroupPath + children []cgroupPath // All the children under this root cgroup +} + +// String implements stringer interface of the struct. +func (c *cgroup) String() string { + return fmt.Sprintf( + "id: %s path: %s num_procs: %d num_children: %d", + c.id, + c.path, + len(c.procs), + len(c.children), + ) +} + // cgroupManager is the container that have cgroup information of resource manager. type cgroupManager struct { + logger log.Logger + fs procfs.FS mode cgroups.CGMode // cgroups mode: unified, legacy, hybrid root string // cgroups root slice string // Slice under which cgroups are managed eg system.slice, machine.slice @@ -93,8 +126,8 @@ type cgroupManager struct { mountPoint string // Path under which resource manager creates cgroups manager string // cgroup manager idRegex *regexp.Regexp // Regular expression to capture cgroup ID set by resource manager - pathFilter func(string) bool // Function to filter cgroup paths. Function must return true if cgroup path must be ignored - procFilter func(string) bool // Function to filter processes in cgroup based on cmdline. Function must return true if process must be ignored + isChild func(string) bool // Function to identify child cgroup paths. Function must return true if cgroup is a child to root cgroup + ignoreProc func(string) bool // Function to filter processes in cgroup based on cmdline. Function must return true if process must be ignored } // String implements stringer interface of the struct. @@ -142,18 +175,130 @@ func (c *cgroupManager) setMountPoint() { } } +// discover finds all the active cgroups in the given mountpoint. +func (c *cgroupManager) discover() ([]cgroup, error) { + var cgroups []cgroup + + cgroupProcs := make(map[string][]procfs.Proc) + + cgroupChildren := make(map[string][]cgroupPath) + + // Walk through all cgroups and get cgroup paths + // https://goplay.tools/snippet/coVDkIozuhg + if err := filepath.WalkDir(c.mountPoint, func(p string, info fs.DirEntry, err error) error { + if err != nil { + return err + } + + // Ignore paths that are not directories + if !info.IsDir() { + return nil + } + + // Get relative path of cgroup + rel, err := filepath.Rel(c.root, p) + if err != nil { + level.Error(c.logger).Log("msg", "Failed to resolve relative path for cgroup", "path", p, "err", err) + + return nil + } + + // Unescape UTF-8 characters in cgroup path + sanitizedPath, err := unescapeString(p) + if err != nil { + level.Error(c.logger).Log("msg", "Failed to sanitize cgroup path", "path", p, "err", err) + + return nil + } + + // Get cgroup ID which is instance ID + cgroupIDMatches := c.idRegex.FindStringSubmatch(sanitizedPath) + if len(cgroupIDMatches) <= 1 { + return nil + } + + id := strings.TrimSpace(cgroupIDMatches[1]) + if id == "" { + level.Error(c.logger).Log("msg", "Empty cgroup ID", "path", p) + + return nil + } + + // Find procs in this cgroup + if data, err := os.ReadFile(filepath.Join(p, "cgroup.procs")); err == nil { + scanner := bufio.NewScanner(bytes.NewReader(data)) + for scanner.Scan() { + if pid, err := strconv.ParseInt(scanner.Text(), 10, 0); err == nil { + if proc, err := c.fs.Proc(int(pid)); err == nil { + cgroupProcs[id] = append(cgroupProcs[id], proc) + } + } + } + } + + // Ignore child cgroups. We are only interested in root cgroup + if c.isChild(p) { + cgroupChildren[id] = append(cgroupChildren[id], cgroupPath{abs: sanitizedPath, rel: rel}) + + return nil + } + + // By default set id and uuid to same cgroup ID and if the resource + // manager has two representations, override it in corresponding + // collector. For instance, it applies only to libvirt + cgrp := cgroup{ + id: id, + uuid: id, + path: cgroupPath{abs: sanitizedPath, rel: rel}, + } + + cgroups = append(cgroups, cgrp) + cgroupChildren[id] = append(cgroupChildren[id], cgroupPath{abs: sanitizedPath, rel: rel}) + + return nil + }); err != nil { + level.Error(c.logger). + Log("msg", "Error walking cgroup subsystem", "path", c.mountPoint, "err", err) + + return nil, err + } + + // Merge cgroupProcs and cgroupChildren with cgroups slice + for icgrp := range cgroups { + if procs, ok := cgroupProcs[cgroups[icgrp].id]; ok { + cgroups[icgrp].procs = procs + } + + if children, ok := cgroupChildren[cgroups[icgrp].id]; ok { + cgroups[icgrp].children = children + } + } + + return cgroups, nil +} + // NewCgroupManager returns an instance of cgroupManager based on resource manager. -func NewCgroupManager(name string) (*cgroupManager, error) { +func NewCgroupManager(name string, logger log.Logger) (*cgroupManager, error) { + // Instantiate a new Proc FS + fs, err := procfs.NewFS(*procfsPath) + if err != nil { + level.Error(logger).Log("msg", "Unable to open procfs", "path", *procfsPath, "err", err) + + return nil, err + } + var manager *cgroupManager switch name { case slurm: if (*forceCgroupsVersion == "" && cgroups.Mode() == cgroups.Unified) || *forceCgroupsVersion == "v2" { manager = &cgroupManager{ - mode: cgroups.Unified, - root: *cgroupfsPath, - slice: "system.slice", - scope: "slurmstepd.scope", + logger: logger, + fs: fs, + mode: cgroups.Unified, + root: *cgroupfsPath, + slice: "system.slice", + scope: "slurmstepd.scope", } } else { var mode cgroups.CGMode @@ -164,6 +309,8 @@ func NewCgroupManager(name string) (*cgroupManager, error) { } manager = &cgroupManager{ + logger: logger, + fs: fs, mode: mode, root: *cgroupfsPath, activeController: *activeController, @@ -177,11 +324,11 @@ func NewCgroupManager(name string) (*cgroupManager, error) { // Add path regex manager.idRegex = slurmCgroupPathRegex - // Add filter functions - manager.pathFilter = func(p string) bool { + // Identify child cgroup + manager.isChild = func(p string) bool { return strings.Contains(p, "/step_") } - manager.procFilter = func(p string) bool { + manager.ignoreProc = func(p string) bool { return slurmIgnoreProcsRegex.MatchString(p) } @@ -193,9 +340,11 @@ func NewCgroupManager(name string) (*cgroupManager, error) { case libvirt: if (*forceCgroupsVersion == "" && cgroups.Mode() == cgroups.Unified) || *forceCgroupsVersion == "v2" { manager = &cgroupManager{ - mode: cgroups.Unified, - root: *cgroupfsPath, - slice: "machine.slice", + logger: logger, + fs: fs, + mode: cgroups.Unified, + root: *cgroupfsPath, + slice: "machine.slice", } } else { var mode cgroups.CGMode @@ -206,6 +355,8 @@ func NewCgroupManager(name string) (*cgroupManager, error) { } manager = &cgroupManager{ + logger: logger, + fs: fs, mode: mode, root: *cgroupfsPath, activeController: *activeController, @@ -219,11 +370,13 @@ func NewCgroupManager(name string) (*cgroupManager, error) { // Add path regex manager.idRegex = libvirtCgroupPathRegex - // Add filter functions - manager.pathFilter = func(p string) bool { - return strings.Contains(p, "/libvirt") + // Identify child cgroup + // In cgroups v1, all the child cgroups like emulator, vcpu* are flat whereas + // in v2 they are all inside libvirt child + manager.isChild = func(p string) bool { + return strings.Contains(p, "/libvirt") || strings.Contains(p, "/emulator") || strings.Contains(p, "/vcpu") } - manager.procFilter = func(p string) bool { + manager.ignoreProc = func(p string) bool { return false } diff --git a/pkg/collector/cgroup_test.go b/pkg/collector/cgroup_test.go index f435f7a7..a4b71dbc 100644 --- a/pkg/collector/cgroup_test.go +++ b/pkg/collector/cgroup_test.go @@ -177,20 +177,28 @@ func TestNewCgroupManagerV2(t *testing.T) { require.NoError(t, err) // Slurm case - manager, err := NewCgroupManager("slurm") + manager, err := NewCgroupManager("slurm", log.NewNopLogger()) require.NoError(t, err) assert.Equal(t, "testdata/sys/fs/cgroup/system.slice/slurmstepd.scope", manager.mountPoint) - assert.NotNil(t, manager.pathFilter) - assert.NotNil(t, manager.procFilter) + assert.NotNil(t, manager.isChild) + assert.NotNil(t, manager.ignoreProc) + + cgroups, err := manager.discover() + require.NoError(t, err) + assert.Len(t, cgroups, 3) // libvirt case - manager, err = NewCgroupManager("libvirt") + manager, err = NewCgroupManager("libvirt", log.NewNopLogger()) require.NoError(t, err) assert.Equal(t, "testdata/sys/fs/cgroup/machine.slice", manager.mountPoint) - assert.NotNil(t, manager.pathFilter) - assert.NotNil(t, manager.procFilter) + assert.NotNil(t, manager.isChild) + assert.NotNil(t, manager.ignoreProc) + + cgroups, err = manager.discover() + require.NoError(t, err) + assert.Len(t, cgroups, 4) } func TestNewCgroupManagerV1(t *testing.T) { @@ -203,23 +211,31 @@ func TestNewCgroupManagerV1(t *testing.T) { require.NoError(t, err) // Slurm case - manager, err := NewCgroupManager("slurm") + manager, err := NewCgroupManager("slurm", log.NewNopLogger()) require.NoError(t, err) assert.Equal(t, "testdata/sys/fs/cgroup/cpuacct/slurm", manager.mountPoint) - assert.NotNil(t, manager.pathFilter) - assert.NotNil(t, manager.procFilter) + assert.NotNil(t, manager.isChild) + assert.NotNil(t, manager.ignoreProc) + + cgroups, err := manager.discover() + require.NoError(t, err) + assert.Len(t, cgroups, 3) // libvirt case - manager, err = NewCgroupManager("libvirt") + manager, err = NewCgroupManager("libvirt", log.NewNopLogger()) require.NoError(t, err) assert.Equal(t, "testdata/sys/fs/cgroup/cpuacct/machine.slice", manager.mountPoint) - assert.NotNil(t, manager.pathFilter) - assert.NotNil(t, manager.procFilter) + assert.NotNil(t, manager.isChild) + assert.NotNil(t, manager.ignoreProc) + + cgroups, err = manager.discover() + require.NoError(t, err) + assert.Len(t, cgroups, 4) // Check error for unknown resource manager - _, err = NewCgroupManager("unknown") + _, err = NewCgroupManager("unknown", log.NewNopLogger()) assert.Error(t, err) } diff --git a/pkg/collector/cli_test.go b/pkg/collector/cli_test.go index bc85c4f3..b8ec5348 100644 --- a/pkg/collector/cli_test.go +++ b/pkg/collector/cli_test.go @@ -35,16 +35,18 @@ func TestCEEMSExporterMain(t *testing.T) { // t.Setenv("PATH", absPath+":"+os.Getenv("PATH")) // Remove test related args and add a dummy arg - os.Args = append([]string{os.Args[0]}, "--web.max-requests=2", "--no-security.drop-privileges", "--collector.ipmi_dcmi.cmd", absPath) + os.Args = append([]string{os.Args[0]}, + "--web.max-requests=2", + "--no-security.drop-privileges", + "--collector.ipmi_dcmi.cmd", absPath, + "--path.procfs", "testdata/proc", + "--path.cgroupfs", "testdata/sys/fs/cgroup", + ) // Create new instance of exporter CLI app a, err := NewCEEMSExporter() require.NoError(t, err) - // Add procfs path - _, err = a.App.Parse([]string{"--path.procfs", "testdata/proc"}) - require.NoError(t, err) - // Start Main go func() { a.Main() diff --git a/pkg/collector/ebpf.go b/pkg/collector/ebpf.go index cb82fe53..fb7ea5ab 100644 --- a/pkg/collector/ebpf.go +++ b/pkg/collector/ebpf.go @@ -9,8 +9,6 @@ import ( "embed" "errors" "fmt" - "io/fs" - "path/filepath" "slices" "strings" "sync" @@ -509,11 +507,9 @@ func NewEbpfCollector(logger log.Logger, cgManager *cgroupManager) (*ebpfCollect // Update implements Collector and update job metrics. // cgroupIDUUIDMap provides a map to cgroupID to compute unit UUID. If the map is empty, it means // cgroup ID and compute unit UUID is identical. -func (c *ebpfCollector) Update(ch chan<- prometheus.Metric, cgroupIDUUIDMap map[string]string) error { - // Fetch all active cgroups - if err := c.discoverCgroups(cgroupIDUUIDMap); err != nil { - return fmt.Errorf("failed to discover cgroups: %w", err) - } +func (c *ebpfCollector) Update(ch chan<- prometheus.Metric, cgroups []cgroup) error { + // Update active cgroups + c.discoverCgroups(cgroups) // Fetch metrics from maps aggMetrics, err := c.readMaps() @@ -837,7 +833,7 @@ func (c *ebpfCollector) readMaps() (*aggMetrics, error) { // discoverCgroups walks through cgroup file system and discover all relevant cgroups based // on cgroupManager. -func (c *ebpfCollector) discoverCgroups(cgroupIDUUIDMap map[string]string) error { +func (c *ebpfCollector) discoverCgroups(cgroups []cgroup) { // Get currently active uuids and cgroup paths to evict older entries in caches var activeCgroupUUIDs []string @@ -846,70 +842,29 @@ func (c *ebpfCollector) discoverCgroups(cgroupIDUUIDMap map[string]string) error // Reset activeCgroups from last scrape c.activeCgroupInodes = make([]uint64, 0) - // Walk through all cgroups and get cgroup paths - if err := filepath.WalkDir(c.cgroupManager.mountPoint, func(p string, info fs.DirEntry, err error) error { - if err != nil { - return err - } - - // Ignore irrelevant cgroup paths - if !info.IsDir() { - return nil - } - - // Unescape UTF-8 characters in cgroup path - sanitizedPath, err := unescapeString(p) - if err != nil { - level.Error(c.logger).Log("msg", "Failed to sanitize cgroup path", "path", p, "err", err) - - return nil - } - - // Get cgroup ID - cgroupIDMatches := c.cgroupManager.idRegex.FindStringSubmatch(sanitizedPath) - if len(cgroupIDMatches) <= 1 { - return nil - } - - cgroupID := strings.TrimSpace(cgroupIDMatches[1]) - if cgroupID == "" { - level.Error(c.logger).Log("msg", "Empty cgroup ID", "path", p) - - return nil - } + for _, cgrp := range cgroups { + uuid := cgrp.uuid - // Get compute unit UUID from cgroup ID - var uuid string - if cgroupIDUUIDMap != nil { - uuid = cgroupIDUUIDMap[cgroupID] - } else { - uuid = cgroupID - } + for _, child := range cgrp.children { + path := child.abs - // Get inode of the cgroup path if not already present in the cache - if _, ok := c.cgroupPathIDCache[p]; !ok { - if inode, err := inode(p); err == nil { - c.cgroupPathIDCache[p] = inode - c.cgroupIDUUIDCache[inode] = uuid + // Get inode of the cgroup path if not already present in the cache + if _, ok := c.cgroupPathIDCache[path]; !ok { + if inode, err := inode(path); err == nil { + c.cgroupPathIDCache[path] = inode + c.cgroupIDUUIDCache[inode] = uuid + } } - } - if _, ok := c.cgroupIDUUIDCache[c.cgroupPathIDCache[p]]; !ok { - c.cgroupIDUUIDCache[c.cgroupPathIDCache[p]] = uuid - } - - // Populate activeCgroupUUIDs, activeCgroupInodes and activeCgroupPaths - activeCgroupPaths = append(activeCgroupPaths, p) - activeCgroupUUIDs = append(activeCgroupUUIDs, uuid) - c.activeCgroupInodes = append(c.activeCgroupInodes, c.cgroupPathIDCache[p]) - level.Debug(c.logger).Log("msg", "cgroup path", "path", p) - - return nil - }); err != nil { - level.Error(c.logger). - Log("msg", "Error walking cgroup subsystem", "path", c.cgroupManager.mountPoint, "err", err) + if _, ok := c.cgroupIDUUIDCache[c.cgroupPathIDCache[path]]; !ok { + c.cgroupIDUUIDCache[c.cgroupPathIDCache[path]] = uuid + } - return err + // Populate activeCgroupUUIDs, activeCgroupInodes and activeCgroupPaths + activeCgroupPaths = append(activeCgroupPaths, path) + activeCgroupUUIDs = append(activeCgroupUUIDs, uuid) + c.activeCgroupInodes = append(c.activeCgroupInodes, c.cgroupPathIDCache[path]) + } } // Evict older entries from caches @@ -924,8 +879,6 @@ func (c *ebpfCollector) discoverCgroups(cgroupIDUUIDMap map[string]string) error delete(c.cgroupPathIDCache, path) } } - - return nil } // aggStats returns aggregate VFS and network metrics by reading diff --git a/pkg/collector/ebpf_test.go b/pkg/collector/ebpf_test.go index 9107cd71..40b3d5cc 100644 --- a/pkg/collector/ebpf_test.go +++ b/pkg/collector/ebpf_test.go @@ -7,7 +7,6 @@ import ( "slices" "testing" - "github.com/containerd/cgroups/v3" "github.com/go-kit/log" "github.com/prometheus/client_golang/prometheus" "github.com/stretchr/testify/assert" @@ -169,7 +168,7 @@ func TestNewEbpfCollector(t *testing.T) { require.NoError(t, err) // cgroup manager - cgManager, err := NewCgroupManager("slurm") + cgManager, err := NewCgroupManager("slurm", log.NewNopLogger()) require.NoError(t, err) collector, err := NewEbpfCollector(log.NewNopLogger(), cgManager) @@ -197,16 +196,15 @@ func TestActiveCgroupsV2(t *testing.T) { _, err := CEEMSExporterApp.Parse( []string{ "--path.cgroupfs", "testdata/sys/fs/cgroup", + "--path.procfs", "testdata/proc", + "--collector.cgroups.force-version", "v2", }, ) require.NoError(t, err) // cgroup manager - cgManager := &cgroupManager{ - mode: cgroups.Unified, - mountPoint: "testdata/sys/fs/cgroup/system.slice/slurmstepd.scope", - idRegex: slurmCgroupPathRegex, - } + cgManager, err := NewCgroupManager("slurm", log.NewNopLogger()) + require.NoError(t, err) // ebpf opts opts := ebpfOpts{ @@ -223,10 +221,13 @@ func TestActiveCgroupsV2(t *testing.T) { cgroupPathIDCache: make(map[string]uint64), } - // Get active cgroups - err = c.discoverCgroups(nil) + // Discover cgroups + cgroups, err := cgManager.discover() require.NoError(t, err) + // Get active cgroups + c.discoverCgroups(cgroups) + assert.Len(t, c.activeCgroupInodes, 39) assert.Len(t, c.cgroupIDUUIDCache, 39) assert.Len(t, c.cgroupPathIDCache, 39) @@ -246,16 +247,15 @@ func TestActiveCgroupsV1(t *testing.T) { _, err := CEEMSExporterApp.Parse( []string{ "--path.cgroupfs", "testdata/sys/fs/cgroup", + "--path.procfs", "testdata/proc", + "--collector.cgroups.force-version", "v1", }, ) require.NoError(t, err) // cgroup manager - cgManager := &cgroupManager{ - mode: cgroups.Legacy, - mountPoint: "testdata/sys/fs/cgroup/cpuacct/slurm", - idRegex: slurmCgroupPathRegex, - } + cgManager, err := NewCgroupManager("slurm", log.NewNopLogger()) + require.NoError(t, err) // ebpf opts opts := ebpfOpts{ @@ -272,10 +272,13 @@ func TestActiveCgroupsV1(t *testing.T) { cgroupPathIDCache: make(map[string]uint64), } - // Get active cgroups - err = c.discoverCgroups(nil) + // Discover cgroups + cgroups, err := cgManager.discover() require.NoError(t, err) + // Get active cgroups + c.discoverCgroups(cgroups) + assert.Len(t, c.activeCgroupInodes, 6) assert.Len(t, c.cgroupIDUUIDCache, 6) assert.Len(t, c.cgroupPathIDCache, 6) diff --git a/pkg/collector/helper.go b/pkg/collector/helper.go index 6f17526f..56e80ce1 100644 --- a/pkg/collector/helper.go +++ b/pkg/collector/helper.go @@ -6,7 +6,6 @@ import ( "os" "path/filepath" "regexp" - "slices" "strconv" "strings" "syscall" @@ -19,11 +18,6 @@ var ( reParens = regexp.MustCompile(`\((.*)\)`) ) -type cgroup struct { - id string - procs []procfs.Proc -} - // SanitizeMetricName sanitize the given metric name by replacing invalid characters by underscores. // // OpenMetrics and the Prometheus exposition format require the metric name @@ -39,63 +33,33 @@ func SanitizeMetricName(metricName string) string { return metricNameRegex.ReplaceAllString(metricName, "_") } -// getCgroups returns a slice of active cgroups and processes contained in each cgroup. -func getCgroups(fs procfs.FS, idRegex *regexp.Regexp, targetEnvVars []string, procFilter func(string) bool) ([]cgroup, error) { - // Get all active procs - allProcs, err := fs.AllProcs() - if err != nil { - return nil, err - } - - // If no idRegex provided, return empty - if idRegex == nil { - return nil, errors.New("cgroup IDs cannot be retrieved due to empty regex") +// cgroupProcFilterer returns a slice of filtered cgroups based on the presence of targetEnvVars +// in the processes of each cgroup. +func cgroupProcFilterer(cgroups []cgroup, targetEnvVars []string, procFilter func(string) bool) []cgroup { + // If targetEnvVars is empty return + if len(targetEnvVars) == 0 { + return cgroups } - cgroupsMap := make(map[string][]procfs.Proc) + var filteredCgroups []cgroup - var cgroupIDs []string - - for _, proc := range allProcs { - // Get cgroup ID from regex - var cgroupID string - - cgrps, err := proc.Cgroups() - if err != nil || len(cgrps) == 0 { - continue - } + for _, cgrp := range cgroups { + var filteredProcs []procfs.Proc - for _, cgrp := range cgrps { - // If cgroup path is root, skip - if cgrp.Path == "/" { - continue - } - - // Unescape UTF-8 characters in cgroup path - sanitizedPath, err := unescapeString(cgrp.Path) - if err != nil { - continue - } + for _, proc := range cgrp.procs { + // Ignore processes where command line matches the regex + if procFilter != nil { + procCmdLine, err := proc.CmdLine() + if err != nil || len(procCmdLine) == 0 { + continue + } - cgroupIDMatches := idRegex.FindStringSubmatch(sanitizedPath) - if len(cgroupIDMatches) <= 1 { - continue + // Ignore process if matches found + if procFilter(strings.Join(procCmdLine, " ")) { + continue + } } - cgroupID = cgroupIDMatches[1] - - break - } - - // If no cgroupID found, ignore - if cgroupID == "" { - continue - } - - // If targetEnvVars is not empty check if this env vars is present for the process - // We dont check for the value of env var. Presence of env var is enough to - // trigger the profiling of that process - if len(targetEnvVars) > 0 { environ, err := proc.Environ() if err != nil { continue @@ -104,46 +68,27 @@ func getCgroups(fs procfs.FS, idRegex *regexp.Regexp, targetEnvVars []string, pr for _, env := range environ { for _, targetEnvVar := range targetEnvVars { if strings.HasPrefix(env, targetEnvVar) { - goto check_process + goto add_proc } } } - // If target env var(s) is not found, return + // If we didnt find any target env vars, continue to next process continue - } - check_process: - // Ignore processes where command line matches the regex - if procFilter != nil { - procCmdLine, err := proc.CmdLine() - if err != nil || len(procCmdLine) == 0 { - continue - } - - // Ignore process if matches found - if procFilter(strings.Join(procCmdLine, " ")) { - continue - } + add_proc: + filteredProcs = append(filteredProcs, proc) } - cgroupsMap[cgroupID] = append(cgroupsMap[cgroupID], proc) - cgroupIDs = append(cgroupIDs, cgroupID) - } - - // Sort cgroupIDs and make slice of cgProcs - cgroups := make([]cgroup, len(cgroupsMap)) - - slices.Sort(cgroupIDs) - - for icgroup, cgroupID := range slices.Compact(cgroupIDs) { - cgroups[icgroup] = cgroup{ - id: cgroupID, - procs: cgroupsMap[cgroupID], + // If there is atleast one process that is filtered, replace procs field + // in cgroup to filteredProcs and append to filteredCgroups + if len(filteredProcs) > 0 { + cgrp.procs = filteredProcs + filteredCgroups = append(filteredCgroups, cgrp) } } - return cgroups, nil + return filteredCgroups } // fileExists checks if given file exists or not. diff --git a/pkg/collector/libvirt.go b/pkg/collector/libvirt.go index 68365791..b2dc01cf 100644 --- a/pkg/collector/libvirt.go +++ b/pkg/collector/libvirt.go @@ -7,7 +7,6 @@ import ( "context" "encoding/xml" "fmt" - "io/fs" "os" "path/filepath" "slices" @@ -107,9 +106,9 @@ type instanceProps struct { } type libvirtMetrics struct { - cgMetrics []cgMetric - instanceProps []instanceProps - instanceIDUUIDMap map[string]string + cgMetrics []cgMetric + instanceProps []instanceProps + cgroups []cgroup } type libvirtCollector struct { @@ -136,8 +135,8 @@ func init() { // NewLibvirtCollector returns a new libvirt collector exposing a summary of cgroups. func NewLibvirtCollector(logger log.Logger) (Collector, error) { - // Get SLURM's cgroup details - cgroupManager, err := NewCgroupManager("libvirt") + // Get libvirt's cgroup details + cgroupManager, err := NewCgroupManager("libvirt", logger) if err != nil { level.Info(logger).Log("msg", "Failed to create cgroup manager", "err", err) @@ -278,10 +277,9 @@ func NewLibvirtCollector(logger log.Logger) (Collector, error) { // Update implements Collector and update instance metrics. func (c *libvirtCollector) Update(ch chan<- prometheus.Metric) error { - // Discover all active cgroups - metrics, err := c.discoverCgroups() + metrics, err := c.instanceMetrics() if err != nil { - return fmt.Errorf("%w: %w", ErrNoData, err) + return err } // Start a wait group @@ -309,7 +307,7 @@ func (c *libvirtCollector) Update(ch chan<- prometheus.Metric) error { defer wg.Done() // Update perf metrics - if err := c.perfCollector.Update(ch, metrics.instanceIDUUIDMap); err != nil { + if err := c.perfCollector.Update(ch, metrics.cgroups); err != nil { level.Error(c.logger).Log("msg", "Failed to update perf stats", "err", err) } }() @@ -322,7 +320,7 @@ func (c *libvirtCollector) Update(ch chan<- prometheus.Metric) error { defer wg.Done() // Update ebpf metrics - if err := c.ebpfCollector.Update(ch, metrics.instanceIDUUIDMap); err != nil { + if err := c.ebpfCollector.Update(ch, metrics.cgroups); err != nil { level.Error(c.logger).Log("msg", "Failed to update IO and/or network stats", "err", err) } }() @@ -335,7 +333,7 @@ func (c *libvirtCollector) Update(ch chan<- prometheus.Metric) error { defer wg.Done() // Update RDMA metrics - if err := c.rdmaCollector.Update(ch, metrics.instanceIDUUIDMap); err != nil { + if err := c.rdmaCollector.Update(ch, metrics.cgroups); err != nil { level.Error(c.logger).Log("msg", "Failed to update RDMA stats", "err", err) } }() @@ -442,8 +440,8 @@ func (c *libvirtCollector) updateGPUOrdinals(ch chan<- prometheus.Metric, instan } } -// discoverCgroups finds active cgroup paths and returns initialised metric structs. -func (c *libvirtCollector) discoverCgroups() (libvirtMetrics, error) { +// instanceProperties finds properties for each cgroup and returns initialised metric structs. +func (c *libvirtCollector) instanceProperties(cgroups []cgroup) libvirtMetrics { // Get currently active instances and set them in activeInstanceIDs state variable var activeInstanceIDs []string @@ -451,8 +449,6 @@ func (c *libvirtCollector) discoverCgroups() (libvirtMetrics, error) { var cgMetrics []cgMetric - instanceIDUUIDMap := make(map[string]string) - // It is possible from Openstack to resize instances by changing flavour. It means // it is possible to add GPUs to non-GPU instances, so we need to invalidate // instancePropsCache once in a while to ensure we capture any changes in instance @@ -462,73 +458,25 @@ func (c *libvirtCollector) discoverCgroups() (libvirtMetrics, error) { c.instancePropslastUpdateTime = time.Now() } - // Walk through all cgroups and get cgroup paths - // https://goplay.tools/snippet/coVDkIozuhg - if err := filepath.WalkDir(c.cgroupManager.mountPoint, func(p string, info fs.DirEntry, err error) error { - if err != nil { - return err - } - - // Ignore inner cgroups of instances - if !info.IsDir() || c.cgroupManager.pathFilter(p) { - return nil - } - - // Get relative path of cgroup - rel, err := filepath.Rel(c.cgroupManager.root, p) - if err != nil { - level.Error(c.logger).Log("msg", "Failed to resolve relative path for cgroup", "path", p, "err", err) - - return nil - } - - // Unescape UTF-8 characters in cgroup path - sanitizedPath, err := unescapeString(p) - if err != nil { - level.Error(c.logger).Log("msg", "Failed to sanitize cgroup path", "path", p, "err", err) - - return nil - } - - // Get cgroup ID which is instance ID - cgroupIDMatches := c.cgroupManager.idRegex.FindStringSubmatch(sanitizedPath) - if len(cgroupIDMatches) <= 1 { - return nil - } - - instanceID := strings.TrimSpace(cgroupIDMatches[1]) - if instanceID == "" { - level.Error(c.logger).Log("msg", "Empty instance ID", "path", p) - - return nil - } - - // Check if we already passed through this instance - if slices.Contains(activeInstanceIDs, instanceID) { - return nil - } + for icgrp := range cgroups { + instanceID := cgroups[icgrp].id // Get instance details if iProps, ok := c.instancePropsCache[instanceID]; !ok { - c.instancePropsCache[instanceID] = c.instanceProperties(instanceID) + c.instancePropsCache[instanceID] = c.getInstanceProperties(instanceID) instnProps = append(instnProps, c.instancePropsCache[instanceID]) - instanceIDUUIDMap[instanceID] = c.instancePropsCache[instanceID].uuid + cgroups[icgrp].uuid = c.instancePropsCache[instanceID].uuid } else { instnProps = append(instnProps, iProps) - instanceIDUUIDMap[instanceID] = iProps.uuid + cgroups[icgrp].uuid = iProps.uuid } - activeInstanceIDs = append(activeInstanceIDs, instanceID) - cgMetrics = append(cgMetrics, cgMetric{uuid: instanceIDUUIDMap[instanceID], path: "/" + rel}) - - level.Debug(c.logger).Log("msg", "cgroup path", "path", p) - - return nil - }); err != nil { - level.Error(c.logger). - Log("msg", "Error walking cgroup subsystem", "path", c.cgroupManager.mountPoint, "err", err) + // Check if we already passed through this instance + if !slices.Contains(activeInstanceIDs, instanceID) { + activeInstanceIDs = append(activeInstanceIDs, instanceID) + } - return libvirtMetrics{}, err + cgMetrics = append(cgMetrics, cgMetric{uuid: cgroups[icgrp].uuid, path: "/" + cgroups[icgrp].path.rel}) } // Remove terminated instances from instancePropsCache @@ -538,11 +486,11 @@ func (c *libvirtCollector) discoverCgroups() (libvirtMetrics, error) { } } - return libvirtMetrics{cgMetrics: cgMetrics, instanceProps: instnProps, instanceIDUUIDMap: instanceIDUUIDMap}, nil + return libvirtMetrics{cgMetrics: cgMetrics, instanceProps: instnProps, cgroups: cgroups} } -// instanceProperties returns instance properties parsed from XML file. -func (c *libvirtCollector) instanceProperties(instanceID string) instanceProps { +// getInstanceProperties returns instance properties parsed from XML file. +func (c *libvirtCollector) getInstanceProperties(instanceID string) instanceProps { // If vGPU is activated on atleast one GPU, update mdevs if c.vGPUActivated { if updatedGPUDevs, err := updateGPUMdevs(c.gpuDevs); err == nil { @@ -577,6 +525,18 @@ func (c *libvirtCollector) instanceProperties(instanceID string) instanceProps { return dataPtr.instanceProps } +// instanceMetrics returns initialised instance metrics structs. +func (c *libvirtCollector) instanceMetrics() (libvirtMetrics, error) { + // Get active cgroups + cgroups, err := c.cgroupManager.discover() + if err != nil { + return libvirtMetrics{}, fmt.Errorf("failed to discover cgroups: %w", err) + } + + // Get all instance properties and initialise metric structs + return c.instanceProperties(cgroups), nil +} + // readLibvirtXMLFile reads the libvirt's XML file inside a security context. func readLibvirtXMLFile(data interface{}) error { // Assert data diff --git a/pkg/collector/libvirt_test.go b/pkg/collector/libvirt_test.go index efe1b98a..5e6bea8f 100644 --- a/pkg/collector/libvirt_test.go +++ b/pkg/collector/libvirt_test.go @@ -71,10 +71,11 @@ func TestLibvirtInstanceProps(t *testing.T) { // cgroup Manager cgManager := &cgroupManager{ + logger: log.NewNopLogger(), mode: cgroups.Unified, mountPoint: "testdata/sys/fs/cgroup/machine.slice", idRegex: libvirtCgroupPathRegex, - pathFilter: func(p string) bool { + isChild: func(p string) bool { return strings.Contains(p, "/libvirt") }, } @@ -114,7 +115,7 @@ func TestLibvirtInstanceProps(t *testing.T) { {uuid: "4de89c5b-50d7-4d30-a630-14e135380fe8", gpuOrdinals: []string(nil)}, } - metrics, err := c.discoverCgroups() + metrics, err := c.instanceMetrics() require.NoError(t, err) assert.EqualValues(t, expectedProps, metrics.instanceProps) @@ -122,7 +123,7 @@ func TestLibvirtInstanceProps(t *testing.T) { // Sleep for 0.5 seconds to ensure we invalidate cache time.Sleep(500 * time.Millisecond) - _, err = c.discoverCgroups() + _, err = c.instanceMetrics() require.NoError(t, err) // Now check if lastUpdateTime is less than 0.5 se @@ -151,11 +152,12 @@ func TestInstancePropsCaching(t *testing.T) { // cgroup Manager cgManager := &cgroupManager{ + logger: log.NewNopLogger(), mode: cgroups.Unified, root: cgroupsPath, mountPoint: cgroupsPath + "/cpuacct/machine.slice", idRegex: libvirtCgroupPathRegex, - pathFilter: func(p string) bool { + isChild: func(p string) bool { return strings.Contains(p, "/libvirt") }, } @@ -225,7 +227,7 @@ func TestInstancePropsCaching(t *testing.T) { } // Now call get metrics which should populate instancePropsCache - _, err = c.discoverCgroups() + _, err = c.instanceMetrics() require.NoError(t, err) // Check if instancePropsCache has 20 instances and GPU ordinals are correct @@ -252,7 +254,7 @@ func TestInstancePropsCaching(t *testing.T) { } // Now call again get metrics which should populate instancePropsCache - _, err = c.discoverCgroups() + _, err = c.instanceMetrics() require.NoError(t, err) // Check if instancePropsCache has only 15 instances and GPU ordinals are empty diff --git a/pkg/collector/perf.go b/pkg/collector/perf.go index 1739b5eb..65e801cb 100644 --- a/pkg/collector/perf.go +++ b/pkg/collector/perf.go @@ -90,18 +90,17 @@ var ( // Security context names. const ( - perfDiscovererCtx = "perf_discoverer" + perfProcFilterCtx = "perf_proc_filter" perfOpenProfilersCtx = "perf_open_profilers" perfCloseProfilersCtx = "perf_close_profilers" ) -// perfDiscovererSecurityCtxData contains the input/output data for -// discoverer function to execute inside security context. -type perfDiscovererSecurityCtxData struct { - procfs procfs.FS - cgroupManager *cgroupManager +// perfProcFilterSecurityCtxData contains the input/output data for +// filterProc function to execute inside security context. +type perfProcFilterSecurityCtxData struct { targetEnvVars []string cgroups []cgroup + ignoreProc func(string) bool } // perfProfilerSecurityCtxData contains the input/output data for @@ -520,14 +519,14 @@ func NewPerfCollector(logger log.Logger, cgManager *cgroupManager) (*perfCollect capabilities = []string{"cap_sys_ptrace", "cap_dac_read_search"} auxCaps := setupCollectorCaps(logger, perfCollectorSubsystem, capabilities) - collector.securityContexts[perfDiscovererCtx], err = security.NewSecurityContext( - perfDiscovererCtx, + collector.securityContexts[perfProcFilterCtx], err = security.NewSecurityContext( + perfProcFilterCtx, auxCaps, - discoverer, + filterPerfProcs, logger, ) if err != nil { - level.Error(logger).Log("msg", "Failed to create a security context for perf discoverer", "err", err) + level.Error(logger).Log("msg", "Failed to create a security context for perf process filter", "err", err) return nil, err } @@ -539,11 +538,15 @@ func NewPerfCollector(logger log.Logger, cgManager *cgroupManager) (*perfCollect // Update implements the Collector interface and will collect metrics per compute unit. // cgroupIDUUIDMap provides a map to cgroupID to compute unit UUID. If the map is empty, it means // cgroup ID and compute unit UUID is identical. -func (c *perfCollector) Update(ch chan<- prometheus.Metric, cgroupIDUUIDMap map[string]string) error { - // Discover new processes - cgroups, err := c.discoverProcess() - if err != nil { - return fmt.Errorf("failed to discover processes: %w", err) +func (c *perfCollector) Update(ch chan<- prometheus.Metric, cgroups []cgroup) error { + var err error + + // Filter processes in cgroups based on target env vars + if len(c.opts.targetEnvVars) > 0 { + cgroups, err = c.filterProcs(cgroups) + if err != nil { + return fmt.Errorf("failed to discover processes: %w", err) + } } // Start new profilers for new processes @@ -566,12 +569,7 @@ func (c *perfCollector) Update(ch chan<- prometheus.Metric, cgroupIDUUIDMap map[ // Update metrics in go routines for each cgroup for _, cgroup := range cgroups { - var uuid string - if cgroupIDUUIDMap != nil { - uuid = cgroupIDUUIDMap[cgroup.id] - } else { - uuid = cgroup.id - } + uuid := cgroup.uuid go func(u string, ps []procfs.Proc) { defer wg.Done() @@ -826,30 +824,23 @@ func (c *perfCollector) updateCacheCounters(cgroupID string, procs []procfs.Proc return errs } -// discoverProcess returns a map of cgroup ID to procs. Depending on presence -// of targetEnvVars, this may be executed in a security context. -func (c *perfCollector) discoverProcess() ([]cgroup, error) { - // Read discovered cgroups into data pointer - dataPtr := &perfDiscovererSecurityCtxData{ - procfs: c.fs, - cgroupManager: c.cgroupManager, +// filterProcs filters the processes that need to be profiled by looking at the +// presence of targetEnvVars. +func (c *perfCollector) filterProcs(cgroups []cgroup) ([]cgroup, error) { + // Setup data pointer + dataPtr := &perfProcFilterSecurityCtxData{ + cgroups: cgroups, targetEnvVars: c.opts.targetEnvVars, + ignoreProc: c.cgroupManager.ignoreProc, } - // If there is a need to read processes' environ, use security context - // else execute function natively - if len(c.opts.targetEnvVars) > 0 { - if securityCtx, ok := c.securityContexts[perfDiscovererCtx]; ok { - if err := securityCtx.Exec(dataPtr); err != nil { - return nil, err - } - } else { - return nil, security.ErrNoSecurityCtx - } - } else { - if err := discoverer(dataPtr); err != nil { + // Use security context as reading procs env vars is a privileged action + if securityCtx, ok := c.securityContexts[perfProcFilterCtx]; ok { + if err := securityCtx.Exec(dataPtr); err != nil { return nil, err } + } else { + return nil, security.ErrNoSecurityCtx } if len(dataPtr.cgroups) > 0 { @@ -1121,28 +1112,19 @@ func closeCacheProfiler(profiler *perf.CacheProfiler) error { return nil } -// discoverer returns a map of discovered cgroup ID to procs by looking at each process -// in proc FS. Walking through cgroup fs is not really an option here as cgroups v1 -// wont have all PIDs of cgroup if the PID controller is not turned on. -// The current implementation should work for both cgroups v1 and v2. -// This function might be executed in a security context if targetEnvVars is not -// empty. -func discoverer(data interface{}) error { +// filterPerfProcs filters the processes of each cgroup inside data pointer based on +// presence of target env vars. +func filterPerfProcs(data interface{}) error { // Assert data is of perfSecurityCtxData - var d *perfDiscovererSecurityCtxData + var d *perfProcFilterSecurityCtxData var ok bool - if d, ok = data.(*perfDiscovererSecurityCtxData); !ok { + if d, ok = data.(*perfProcFilterSecurityCtxData); !ok { return security.ErrSecurityCtxDataAssertion } - cgroups, err := getCgroups(d.procfs, d.cgroupManager.idRegex, d.targetEnvVars, d.cgroupManager.procFilter) - if err != nil { - return err - } - - // Read cgroups proc map into d - d.cgroups = cgroups + // Read filtered cgroups into d + d.cgroups = cgroupProcFilterer(d.cgroups, d.targetEnvVars, d.ignoreProc) return nil } diff --git a/pkg/collector/perf_test.go b/pkg/collector/perf_test.go index b369b49c..a74e405b 100644 --- a/pkg/collector/perf_test.go +++ b/pkg/collector/perf_test.go @@ -6,6 +6,7 @@ package collector import ( "context" "os" + "slices" "testing" "github.com/containerd/cgroups/v3" @@ -31,7 +32,7 @@ func TestPerfCollector(t *testing.T) { mode: cgroups.Unified, mountPoint: "testdata/sys/fs/cgroup/system.slice/slurmstepd.scope", idRegex: slurmCgroupPathRegex, - procFilter: func(p string) bool { + ignoreProc: func(p string) bool { return slurmIgnoreProcsRegex.MatchString(p) }, } @@ -58,16 +59,16 @@ func TestPerfCollector(t *testing.T) { } func TestDiscoverProcess(t *testing.T) { - var err error + _, err := CEEMSExporterApp.Parse([]string{ + "--path.procfs", "testdata/proc", + "--path.cgroupfs", "testdata/sys/fs/cgroup", + "--collector.cgroups.force-version", "v2", + }) + require.NoError(t, err) // cgroup manager - cgManager := &cgroupManager{ - mode: cgroups.Unified, - idRegex: slurmCgroupPathRegex, - procFilter: func(p string) bool { - return slurmIgnoreProcsRegex.MatchString(p) - }, - } + cgManager, err := NewCgroupManager("slurm", log.NewNopLogger()) + require.NoError(t, err) // perf opts opts := perfOpts{ @@ -85,10 +86,10 @@ func TestDiscoverProcess(t *testing.T) { } // Create dummy security context - collector.securityContexts[perfDiscovererCtx], err = security.NewSecurityContext( - perfDiscovererCtx, + collector.securityContexts[perfProcFilterCtx], err = security.NewSecurityContext( + perfProcFilterCtx, nil, - discoverer, + filterPerfProcs, collector.logger, ) require.NoError(t, err) @@ -96,15 +97,19 @@ func TestDiscoverProcess(t *testing.T) { collector.fs, err = procfs.NewFS("testdata/proc") require.NoError(t, err) - // Discover processes - cgroups, err := collector.discoverProcess() + // Discover cgroups + cgroups, err := cgManager.discover() + require.NoError(t, err) + + // Filter processes + cgroups, err = collector.filterProcs(cgroups) require.NoError(t, err) // expected - expectedCgroupIDs := []string{"1320003", "4824887"} + expectedCgroupIDs := []string{"1009248", "1009249"} expectedCgroupProcs := map[string][]int{ - "4824887": {46231, 46281}, - "1320003": {46235, 46236}, + "1009248": {46231, 46281}, + "1009249": {46235, 46236}, } // Get cgroup IDs @@ -113,20 +118,19 @@ func TestDiscoverProcess(t *testing.T) { cgroupProcs := make(map[string][]int) for _, cgroup := range cgroups { - cgroupIDs = append(cgroupIDs, cgroup.id) + if !slices.Contains(cgroupIDs, cgroup.id) { + cgroupIDs = append(cgroupIDs, cgroup.id) + } - var pids []int for _, proc := range cgroup.procs { - pids = append(pids, proc.PID) + cgroupProcs[cgroup.id] = append(cgroupProcs[cgroup.id], proc.PID) } - - cgroupProcs[cgroup.id] = pids } assert.ElementsMatch(t, cgroupIDs, expectedCgroupIDs) for _, cgroupID := range cgroupIDs { - assert.ElementsMatch(t, cgroupProcs[cgroupID], expectedCgroupProcs[cgroupID]) + assert.ElementsMatch(t, cgroupProcs[cgroupID], expectedCgroupProcs[cgroupID], "cgroup %s", cgroupID) } } @@ -137,20 +141,17 @@ func TestNewProfilers(t *testing.T) { _, err = CEEMSExporterApp.Parse([]string{ "--path.procfs", "testdata/proc", + "--path.cgroupfs", "testdata/sys/fs/cgroup", "--collector.perf.hardware-events", "--collector.perf.software-events", "--collector.perf.hardware-cache-events", + "--collector.cgroups.force-version", "v1", }) require.NoError(t, err) // cgroup manager - cgManager := &cgroupManager{ - mode: cgroups.Legacy, - idRegex: slurmCgroupPathRegex, - procFilter: func(p string) bool { - return slurmIgnoreProcsRegex.MatchString(p) - }, - } + cgManager, err := NewCgroupManager("slurm", log.NewNopLogger()) + require.NoError(t, err) collector, err := NewPerfCollector(log.NewNopLogger(), cgManager) require.NoError(t, err) diff --git a/pkg/collector/rdma.go b/pkg/collector/rdma.go index dfc4adad..37b44819 100644 --- a/pkg/collector/rdma.go +++ b/pkg/collector/rdma.go @@ -228,9 +228,7 @@ func NewRDMACollector(logger log.Logger, cgManager *cgroupManager) (*rdmaCollect } // Update implements Collector and exposes RDMA related metrics. -// cgroupIDUUIDMap provides a map to cgroupID to compute unit UUID. If the map is empty, it means -// cgroup ID and compute unit UUID is identical. -func (c *rdmaCollector) Update(ch chan<- prometheus.Metric, cgroupIDUUIDMap map[string]string) error { +func (c *rdmaCollector) Update(ch chan<- prometheus.Metric, cgroups []cgroup) error { if !c.isAvailable { return ErrNoData } @@ -240,7 +238,7 @@ func (c *rdmaCollector) Update(ch chan<- prometheus.Metric, cgroupIDUUIDMap map[ level.Error(c.logger).Log("msg", "Failed to enable Per-PID QP stats", "err", err) } - return c.update(ch, cgroupIDUUIDMap) + return c.update(ch, cgroups) } // Stop releases system resources used by the collector. @@ -300,14 +298,9 @@ func (c *rdmaCollector) perPIDCounters(enable bool) error { } // update fetches different RDMA stats. -func (c *rdmaCollector) update(ch chan<- prometheus.Metric, cgroupIDUUIDMap map[string]string) error { - // First get cgroups and their associated procs - procCgroup, err := c.procCgroups(cgroupIDUUIDMap) - if err != nil { - level.Error(c.logger).Log("msg", "Failed to fetch active cgroups", "err", err) - - return ErrNoData - } +func (c *rdmaCollector) update(ch chan<- prometheus.Metric, cgroups []cgroup) error { + // Make invert mapping of cgroups + procCgroup := c.procCgroupMapper(cgroups) // Initialise a wait group wg := sync.WaitGroup{} @@ -413,26 +406,13 @@ func (c *rdmaCollector) update(ch chan<- prometheus.Metric, cgroupIDUUIDMap map[ return nil } -// procCgroups returns cgroup ID of all relevant processes. -func (c *rdmaCollector) procCgroups(cgroupIDUUIDMap map[string]string) (map[string]string, error) { - // First get cgroups and their associated procs - cgroups, err := getCgroups(c.procfs, c.cgroupManager.idRegex, nil, c.cgroupManager.procFilter) - if err != nil { - level.Error(c.logger).Log("msg", "Failed to fetch active cgroups", "err", err) - - return nil, err - } - +// procCgroupMapper returns cgroup ID of all relevant processes map. +func (c *rdmaCollector) procCgroupMapper(cgroups []cgroup) map[string]string { // Make invert mapping of cgroups procCgroup := make(map[string]string) for _, cgroup := range cgroups { - var uuid string - if cgroupIDUUIDMap != nil { - uuid = cgroupIDUUIDMap[cgroup.id] - } else { - uuid = cgroup.id - } + uuid := cgroup.uuid for _, proc := range cgroup.procs { p := strconv.FormatInt(int64(proc.PID), 10) @@ -440,7 +420,7 @@ func (c *rdmaCollector) procCgroups(cgroupIDUUIDMap map[string]string) (map[stri } } - return procCgroup, nil + return procCgroup } // devMR returns Memory Regions (MRs) stats of all active cgroups. diff --git a/pkg/collector/rdma_test.go b/pkg/collector/rdma_test.go index 58bc165d..97fd2de9 100644 --- a/pkg/collector/rdma_test.go +++ b/pkg/collector/rdma_test.go @@ -27,10 +27,11 @@ func TestRDMACollector(t *testing.T) { // cgroup manager cgManager := &cgroupManager{ + logger: log.NewNopLogger(), mode: cgroups.Unified, mountPoint: "testdata/sys/fs/cgroup/system.slice/slurmstepd.scope", idRegex: slurmCgroupPathRegex, - procFilter: func(p string) bool { + ignoreProc: func(p string) bool { return slurmIgnoreProcsRegex.MatchString(p) }, } @@ -59,17 +60,14 @@ func TestRDMACollector(t *testing.T) { func TestDevMR(t *testing.T) { _, err := CEEMSExporterApp.Parse([]string{ "--path.procfs", "testdata/proc", + "--path.cgroupfs", "testdata/sys/fs/cgroup", + "--collector.cgroups.force-version", "v2", }) require.NoError(t, err) // cgroup manager - cgManager := &cgroupManager{ - mode: cgroups.Unified, - idRegex: slurmCgroupPathRegex, - procFilter: func(p string) bool { - return slurmIgnoreProcsRegex.MatchString(p) - }, - } + cgManager, err := NewCgroupManager("slurm", log.NewNopLogger()) + require.NoError(t, err) // Instantiate a new Proc FS procfs, err := procfs.NewFS(*procfsPath) @@ -82,13 +80,15 @@ func TestDevMR(t *testing.T) { cgroupManager: cgManager, } - // Get cgroup IDs - procCgroup, err := c.procCgroups(nil) + // Get cgroups + cgroups, err := cgManager.discover() require.NoError(t, err) + procCgroup := c.procCgroupMapper(cgroups) + expectedMRs := map[string]*mr{ - "1320003": {2, 4194304, "mlx5_0"}, - "4824887": {2, 4194304, "mlx5_0"}, + "1009248": {2, 4194304, "mlx5_0"}, + "1009249": {2, 4194304, "mlx5_0"}, } // Get MR stats @@ -100,17 +100,14 @@ func TestDevMR(t *testing.T) { func TestDevCQ(t *testing.T) { _, err := CEEMSExporterApp.Parse([]string{ "--path.procfs", "testdata/proc", + "--path.cgroupfs", "testdata/sys/fs/cgroup", + "--collector.cgroups.force-version", "v1", }) require.NoError(t, err) // cgroup manager - cgManager := &cgroupManager{ - mode: cgroups.Unified, - idRegex: slurmCgroupPathRegex, - procFilter: func(p string) bool { - return slurmIgnoreProcsRegex.MatchString(p) - }, - } + cgManager, err := NewCgroupManager("slurm", log.NewNopLogger()) + require.NoError(t, err) // Instantiate a new Proc FS procfs, err := procfs.NewFS(*procfsPath) @@ -124,12 +121,14 @@ func TestDevCQ(t *testing.T) { } // Get cgroup IDs - procCgroup, err := c.procCgroups(nil) + cgroups, err := cgManager.discover() require.NoError(t, err) + procCgroup := c.procCgroupMapper(cgroups) + expectedCQs := map[string]*cq{ - "1320003": {2, 8190, "mlx5_0"}, - "4824887": {2, 8190, "mlx5_0"}, + "1009248": {2, 8190, "mlx5_0"}, + "1009249": {2, 8190, "mlx5_0"}, } // Get MR stats @@ -141,17 +140,14 @@ func TestDevCQ(t *testing.T) { func TestLinkQP(t *testing.T) { _, err := CEEMSExporterApp.Parse([]string{ "--path.procfs", "testdata/proc", + "--path.cgroupfs", "testdata/sys/fs/cgroup", + "--collector.cgroups.force-version", "v1", }) require.NoError(t, err) // cgroup manager - cgManager := &cgroupManager{ - mode: cgroups.Unified, - idRegex: slurmCgroupPathRegex, - procFilter: func(p string) bool { - return slurmIgnoreProcsRegex.MatchString(p) - }, - } + cgManager, err := NewCgroupManager("slurm", log.NewNopLogger()) + require.NoError(t, err) // Instantiate a new Proc FS procfs, err := procfs.NewFS(*procfsPath) @@ -167,12 +163,14 @@ func TestLinkQP(t *testing.T) { } // Get cgroup IDs - procCgroup, err := c.procCgroups(nil) + cgroups, err := cgManager.discover() require.NoError(t, err) + procCgroup := c.procCgroupMapper(cgroups) + expected := map[string]*qp{ - "1320003": {16, "mlx5_0", "1", map[string]uint64{"rx_read_requests": 0, "rx_write_requests": 41988882}}, - "4824887": {16, "mlx5_0", "1", map[string]uint64{"rx_write_requests": 0, "rx_read_requests": 0}}, + "1009249": {16, "mlx5_0", "1", map[string]uint64{"rx_read_requests": 0, "rx_write_requests": 41988882}}, + "1009248": {16, "mlx5_0", "1", map[string]uint64{"rx_write_requests": 0, "rx_read_requests": 0}}, } // Get MR stats @@ -189,9 +187,10 @@ func TestLinkCountersSysWide(t *testing.T) { // cgroup manager cgManager := &cgroupManager{ + logger: log.NewNopLogger(), mode: cgroups.Unified, idRegex: slurmCgroupPathRegex, - procFilter: func(p string) bool { + ignoreProc: func(p string) bool { return slurmIgnoreProcsRegex.MatchString(p) }, } diff --git a/pkg/collector/slurm.go b/pkg/collector/slurm.go index e38887ca..250e4d25 100644 --- a/pkg/collector/slurm.go +++ b/pkg/collector/slurm.go @@ -7,9 +7,6 @@ import ( "context" "errors" "fmt" - "io/fs" - "os" - "path/filepath" "slices" "strconv" "strings" @@ -52,10 +49,6 @@ var ( `GPU order mapping between SLURM and NVIDIA SMI/ROCm SMI tools. It should be of format : [.] delimited by ",".`, ).Default("").PlaceHolder("0:1,1:0.3,2:0.4,3:0.5,4:0.6").String() - slurmGPUStatPath = CEEMSExporterApp.Flag( - "collector.slurm.gpu-job-map-path", - "Path to directory that maps GPU ordinals to job IDs.", - ).Default("").String() ) // Security context names. @@ -66,7 +59,7 @@ const ( // slurmReadProcSecurityCtxData contains the input/output data for // reading processes inside a security context. type slurmReadProcSecurityCtxData struct { - procfs procfs.FS + procs []procfs.Proc uuid string gpuOrdinals []string } @@ -85,6 +78,7 @@ func (p *jobProps) emptyGPUOrdinals() bool { type slurmMetrics struct { cgMetrics []cgMetric jobProps []jobProps + cgroups []cgroup } type slurmCollector struct { @@ -121,7 +115,7 @@ func NewSlurmCollector(logger log.Logger) (Collector, error) { } // Get SLURM's cgroup details - cgroupManager, err := NewCgroupManager("slurm") + cgroupManager, err := NewCgroupManager("slurm", logger) if err != nil { level.Info(logger).Log("msg", "Failed to create cgroup manager", "err", err) @@ -205,7 +199,7 @@ func NewSlurmCollector(logger log.Logger) (Collector, error) { if *slurmGPUOrdering != "" { gpuDevs = reindexGPUs(*slurmGPUOrdering, gpuDevs) - level.Debug(logger).Log("msg", "GPU reindexed based") + level.Debug(logger).Log("msg", "GPUs reindexed") } // Instantiate a new Proc FS @@ -264,10 +258,10 @@ func NewSlurmCollector(logger log.Logger) (Collector, error) { // Update implements Collector and update job metrics. func (c *slurmCollector) Update(ch chan<- prometheus.Metric) error { - // Discover all active cgroups - metrics, err := c.discoverCgroups() + // Initialise job metrics + metrics, err := c.jobMetrics() if err != nil { - return fmt.Errorf("failed to discover cgroups: %w", err) + return err } // Start a wait group @@ -295,7 +289,7 @@ func (c *slurmCollector) Update(ch chan<- prometheus.Metric) error { defer wg.Done() // Update perf metrics - if err := c.perfCollector.Update(ch, nil); err != nil { + if err := c.perfCollector.Update(ch, metrics.cgroups); err != nil { level.Error(c.logger).Log("msg", "Failed to update perf stats", "err", err) } }() @@ -308,7 +302,7 @@ func (c *slurmCollector) Update(ch chan<- prometheus.Metric) error { defer wg.Done() // Update ebpf metrics - if err := c.ebpfCollector.Update(ch, nil); err != nil { + if err := c.ebpfCollector.Update(ch, metrics.cgroups); err != nil { level.Error(c.logger).Log("msg", "Failed to update IO and/or network stats", "err", err) } }() @@ -321,7 +315,7 @@ func (c *slurmCollector) Update(ch chan<- prometheus.Metric) error { defer wg.Done() // Update RDMA metrics - if err := c.rdmaCollector.Update(ch, nil); err != nil { + if err := c.rdmaCollector.Update(ch, metrics.cgroups); err != nil { level.Error(c.logger).Log("msg", "Failed to update RDMA stats", "err", err) } }() @@ -418,8 +412,8 @@ func (c *slurmCollector) updateGPUOrdinals(ch chan<- prometheus.Metric, jobProps } } -// discoverCgroups finds active cgroup paths and returns initialised metric structs. -func (c *slurmCollector) discoverCgroups() (slurmMetrics, error) { +// jobProperties finds job properties for each active cgroup and returns initialised metric structs. +func (c *slurmCollector) jobProperties(cgroups []cgroup) slurmMetrics { // Get currently active jobs and set them in activeJobs state variable var activeJobUUIDs []string @@ -429,65 +423,28 @@ func (c *slurmCollector) discoverCgroups() (slurmMetrics, error) { var gpuOrdinals []string - // Walk through all cgroups and get cgroup paths - if err := filepath.WalkDir(c.cgroupManager.mountPoint, func(p string, info fs.DirEntry, err error) error { - if err != nil { - return err - } - - // Ignore step jobs - if !info.IsDir() || c.cgroupManager.pathFilter(p) { - return nil - } - - // Get relative path of cgroup - rel, err := filepath.Rel(c.cgroupManager.root, p) - if err != nil { - level.Error(c.logger).Log("msg", "Failed to resolve relative path for cgroup", "path", p, "err", err) - - return nil - } - - // Get cgroup ID which is job ID - cgroupIDMatches := c.cgroupManager.idRegex.FindStringSubmatch(p) - if len(cgroupIDMatches) <= 1 { - return nil - } - - jobuuid := strings.TrimSpace(cgroupIDMatches[1]) - if jobuuid == "" { - level.Error(c.logger).Log("msg", "Empty job ID", "path", p) - - return nil - } - - // Check if we already passed through this job - if slices.Contains(activeJobUUIDs, jobuuid) { - return nil - } + // Iterate over all active cgroups and get job properties + for _, cgrp := range cgroups { + jobuuid := cgrp.uuid // Get GPU ordinals of the job if len(c.gpuDevs) > 0 { if jobPropsCached, ok := c.jobPropsCache[jobuuid]; !ok || (ok && jobPropsCached.emptyGPUOrdinals()) { - gpuOrdinals = c.gpuOrdinals(jobuuid) + gpuOrdinals = c.gpuOrdinals(jobuuid, cgrp.procs) c.jobPropsCache[jobuuid] = jobProps{uuid: jobuuid, gpuOrdinals: gpuOrdinals} jProps = append(jProps, c.jobPropsCache[jobuuid]) } else { - jProps = append(jProps, jobPropsCached) + jProps = append(jProps, c.jobPropsCache[jobuuid]) } } - activeJobUUIDs = append(activeJobUUIDs, jobuuid) - cgMetrics = append(cgMetrics, cgMetric{uuid: jobuuid, path: "/" + rel}) - - level.Debug(c.logger).Log("msg", "cgroup path", "path", p) - - return nil - }); err != nil { - level.Error(c.logger). - Log("msg", "Error walking cgroup subsystem", "path", c.cgroupManager.mountPoint, "err", err) + // Check if we already passed through this job + if !slices.Contains(activeJobUUIDs, jobuuid) { + activeJobUUIDs = append(activeJobUUIDs, jobuuid) + } - return slurmMetrics{}, err + // Add to cgroups only if it is a root cgroup + cgMetrics = append(cgMetrics, cgMetric{uuid: jobuuid, path: "/" + cgrp.path.rel}) } // Remove expired jobs from jobPropsCache @@ -497,82 +454,29 @@ func (c *slurmCollector) discoverCgroups() (slurmMetrics, error) { } } - return slurmMetrics{cgMetrics: cgMetrics, jobProps: jProps}, nil + return slurmMetrics{cgMetrics: cgMetrics, jobProps: jProps, cgroups: cgroups} } -// readGPUMapFile reads file created by prolog script to retrieve job ID of a given GPU. -func (c *slurmCollector) readGPUMapFile(index string) string { - gpuJobMapInfo := fmt.Sprintf("%s/%s", *slurmGPUStatPath, index) - - // NOTE: Look for file name with UUID as it will be more appropriate with - // MIG instances. - // If /run/gpustat/0 file is not found, check for the file with UUID as name? - var uuid string - - if _, err := os.Stat(gpuJobMapInfo); err == nil { - content, err := os.ReadFile(gpuJobMapInfo) - if err != nil { - level.Error(c.logger).Log( - "msg", "Failed to get job ID for GPU", - "index", index, "err", err, - ) - - return "" - } - - if _, err := fmt.Sscanf(string(content), "%s", &uuid); err != nil { - level.Error(c.logger).Log( - "msg", "Failed to scan job ID for GPU", - "index", index, "err", err, - ) - - return "" - } - - return uuid +// jobMetrics returns initialised metric structs. +func (c *slurmCollector) jobMetrics() (slurmMetrics, error) { + // Get active cgroups + cgroups, err := c.cgroupManager.discover() + if err != nil { + return slurmMetrics{}, fmt.Errorf("failed to discover cgroups: %w", err) } - return "" + // Get job properties and initialise metric structs + return c.jobProperties(cgroups), nil } -// gpuOrdinalsFromProlog returns GPU ordinals of jobs from prolog generated run time files by SLURM. -func (c *slurmCollector) gpuOrdinalsFromProlog(uuid string) []string { +// gpuOrdinals returns GPU ordinals bound to current job. +func (c *slurmCollector) gpuOrdinals(uuid string, procs []procfs.Proc) []string { var gpuOrdinals []string - // If there are no GPUs this loop will be skipped anyways - // NOTE: In go loop over map is not reproducible. The order is undefined and thus - // we might end up with a situation where jobGPUOrdinals will [1 2] or [2 1] if - // current Job has two GPUs. This will fail unit tests as order in Slice is important - // in Go - // - // So we use map[int]Device to have int indices for devices which we use internally - // We are not using device index as it might be a non-integer. We are not sure about - // it but just to be safe. This will have a small overhead as we need to check the - // correct integer index for each device index. We can live with it as there are - // typically 2/4/8 GPUs per node. - for _, dev := range c.gpuDevs { - if dev.migEnabled { - for _, mig := range dev.migInstances { - if c.readGPUMapFile(mig.globalIndex) == uuid { - gpuOrdinals = append(gpuOrdinals, mig.globalIndex) - } - } - } else { - if c.readGPUMapFile(dev.globalIndex) == uuid { - gpuOrdinals = append(gpuOrdinals, dev.globalIndex) - } - } - } - - return gpuOrdinals -} - -// gpuOrdinalsFromEnviron returns GPU ordinals of jobs by reading environment variables of jobs. -func (c *slurmCollector) gpuOrdinalsFromEnviron(uuid string) []string { // Read env vars in a security context that raises necessary capabilities dataPtr := &slurmReadProcSecurityCtxData{ - procfs: c.procFS, - uuid: uuid, + procs: procs, + uuid: uuid, } if securityCtx, ok := c.securityContexts[slurmReadProcCtx]; ok { @@ -591,24 +495,8 @@ func (c *slurmCollector) gpuOrdinalsFromEnviron(uuid string) []string { return nil } - return dataPtr.gpuOrdinals -} - -// gpuOrdinals returns GPU ordinals bound to current job. -func (c *slurmCollector) gpuOrdinals(uuid string) []string { - var gpuOrdinals []string - - // First try to read files that might be created by SLURM prolog scripts - gpuOrdinals = c.gpuOrdinalsFromProlog(uuid) - - // If we fail to get necessary job properties, try to get these properties - // by looking into environment variables - if len(gpuOrdinals) == 0 { - gpuOrdinals = c.gpuOrdinalsFromEnviron(uuid) - } - // Emit warning when there are GPUs but no job to GPU map found - if len(gpuOrdinals) == 0 { + if len(dataPtr.gpuOrdinals) == 0 { level.Warn(c.logger). Log("msg", "Failed to get GPU ordinals for job", "jobid", uuid) } else { @@ -617,7 +505,7 @@ func (c *slurmCollector) gpuOrdinals(uuid string) []string { ) } - return gpuOrdinals + return dataPtr.gpuOrdinals } // readProcEnvirons reads the environment variables of processes and returns @@ -633,14 +521,6 @@ func readProcEnvirons(data interface{}) error { var gpuOrdinals []string - // Attempt to get GPU ordinals from /proc file system by looking into - // environ for the process that has same SLURM_JOB_ID - // Get all procs from current proc fs if passed pids slice is nil - allProcs, err := d.procfs.AllProcs() - if err != nil { - return fmt.Errorf("failed to read /proc: %w", err) - } - // Env var that we will search jobIDEnv := "SLURM_JOB_ID=" + d.uuid @@ -651,7 +531,7 @@ func readProcEnvirons(data interface{}) error { // WILL NOT BE scheduled on this locked thread and hence will not // have capabilities to read environment variables. So, we just do // old school loop on procs and attempt to find target env variables. - for _, proc := range allProcs { + for _, proc := range d.procs { // Read process environment variables // NOTE: This needs CAP_SYS_PTRACE and CAP_DAC_READ_SEARCH caps // on the current process diff --git a/pkg/collector/slurm_test.go b/pkg/collector/slurm_test.go index d3e76a77..bc85bd20 100644 --- a/pkg/collector/slurm_test.go +++ b/pkg/collector/slurm_test.go @@ -48,7 +48,6 @@ func TestNewSlurmCollector(t *testing.T) { "--path.cgroupfs", "testdata/sys/fs/cgroup", "--path.procfs", "testdata/proc", "--path.sysfs", "testdata/sys", - "--collector.slurm.gpu-job-map-path", "testdata/gpujobmap", "--collector.slurm.swap-memory-metrics", "--collector.slurm.psi-metrics", "--collector.perf.hardware-events", @@ -80,51 +79,52 @@ func TestNewSlurmCollector(t *testing.T) { require.NoError(t, err) } -func TestSlurmJobPropsWithProlog(t *testing.T) { - _, err := CEEMSExporterApp.Parse( - []string{ - "--path.cgroupfs", "testdata/sys/fs/cgroup", - "--collector.slurm.gpu-job-map-path", "testdata/gpujobmap", - "--collector.cgroups.force-version", "v2", - }, - ) - require.NoError(t, err) - - // cgroup Manager - cgManager := &cgroupManager{ - mode: cgroups.Unified, - mountPoint: "testdata/sys/fs/cgroup/system.slice/slurmstepd.scope", - idRegex: slurmCgroupPathRegex, - pathFilter: func(p string) bool { - return strings.Contains(p, "/step_") - }, - } - - c := slurmCollector{ - gpuDevs: mockGPUDevices(), - logger: log.NewNopLogger(), - cgroupManager: cgManager, - jobPropsCache: make(map[string]jobProps), - } - - expectedProps := jobProps{ - gpuOrdinals: []string{"0"}, - uuid: "1009249", - } - - metrics, err := c.discoverCgroups() - require.NoError(t, err) - - var gotProps jobProps - - for _, props := range metrics.jobProps { - if props.uuid == expectedProps.uuid { - gotProps = props - } - } - - assert.Equal(t, expectedProps, gotProps) -} +// func TestSlurmJobPropsWithProlog(t *testing.T) { +// _, err := CEEMSExporterApp.Parse( +// []string{ +// "--path.cgroupfs", "testdata/sys/fs/cgroup", +// "--collector.slurm.gpu-job-map-path", "testdata/gpujobmap", +// "--collector.cgroups.force-version", "v2", +// }, +// ) +// require.NoError(t, err) + +// // cgroup Manager +// cgManager := &cgroupManager{ +// logger: log.NewNopLogger(), +// mode: cgroups.Unified, +// mountPoint: "testdata/sys/fs/cgroup/system.slice/slurmstepd.scope", +// idRegex: slurmCgroupPathRegex, +// ignoreCgroup: func(p string) bool { +// return strings.Contains(p, "/step_") +// }, +// } + +// c := slurmCollector{ +// gpuDevs: mockGPUDevices(), +// logger: log.NewNopLogger(), +// cgroupManager: cgManager, +// jobPropsCache: make(map[string]jobProps), +// } + +// expectedProps := jobProps{ +// gpuOrdinals: []string{"0"}, +// uuid: "1009249", +// } + +// metrics, err := c.jobMetrics() +// require.NoError(t, err) + +// var gotProps jobProps + +// for _, props := range metrics.jobProps { +// if props.uuid == expectedProps.uuid { +// gotProps = props +// } +// } + +// assert.Equal(t, expectedProps, gotProps) +// } func TestSlurmJobPropsWithProcsFS(t *testing.T) { _, err := CEEMSExporterApp.Parse( @@ -139,15 +139,9 @@ func TestSlurmJobPropsWithProcsFS(t *testing.T) { procFS, err := procfs.NewFS(*procfsPath) require.NoError(t, err) - // cgroup Manager - cgManager := &cgroupManager{ - mode: cgroups.Legacy, - mountPoint: "testdata/sys/fs/cgroup/cpuacct/slurm", - idRegex: slurmCgroupPathRegex, - pathFilter: func(p string) bool { - return strings.Contains(p, "/step_") - }, - } + // cgroup manager + cgManager, err := NewCgroupManager("slurm", log.NewNopLogger()) + require.NoError(t, err) c := slurmCollector{ cgroupManager: cgManager, @@ -167,23 +161,25 @@ func TestSlurmJobPropsWithProcsFS(t *testing.T) { ) require.NoError(t, err) - expectedProps := jobProps{ - uuid: "1009248", - gpuOrdinals: []string{"2", "3"}, + expectedProps := []jobProps{ + { + uuid: "1009248", + gpuOrdinals: []string{"2", "3"}, + }, + { + uuid: "1009249", + gpuOrdinals: []string{"0"}, + }, + { + uuid: "1009250", + gpuOrdinals: []string{"1"}, + }, } - metrics, err := c.discoverCgroups() + metrics, err := c.jobMetrics() require.NoError(t, err) - var gotProps jobProps - - for _, props := range metrics.jobProps { - if props.uuid == expectedProps.uuid { - gotProps = props - } - } - - assert.Equal(t, expectedProps, gotProps) + assert.Equal(t, expectedProps, metrics.jobProps) } func TestJobPropsCaching(t *testing.T) { @@ -193,57 +189,76 @@ func TestJobPropsCaching(t *testing.T) { err := os.Mkdir(cgroupsPath, 0o750) require.NoError(t, err) - gpuMapFilePath := path + "/gpu-map" - err = os.Mkdir(gpuMapFilePath, 0o750) + procFS := path + "/proc" + err = os.Mkdir(procFS, 0o750) require.NoError(t, err) - _, err = CEEMSExporterApp.Parse( - []string{ - "--path.cgroupfs", cgroupsPath, - "--collector.slurm.gpu-job-map-path", gpuMapFilePath, - }, - ) + fs, err := procfs.NewFS(procFS) require.NoError(t, err) // cgroup Manager cgManager := &cgroupManager{ + logger: log.NewNopLogger(), + fs: fs, mode: cgroups.Legacy, root: cgroupsPath, idRegex: slurmCgroupPathRegex, mountPoint: cgroupsPath + "/cpuacct/slurm", - pathFilter: func(p string) bool { + isChild: func(p string) bool { return false }, } mockGPUDevs := mockGPUDevices() c := slurmCollector{ - cgroupManager: cgManager, - logger: log.NewNopLogger(), - gpuDevs: mockGPUDevs, - jobPropsCache: make(map[string]jobProps), + cgroupManager: cgManager, + logger: log.NewNopLogger(), + gpuDevs: mockGPUDevs, + jobPropsCache: make(map[string]jobProps), + securityContexts: make(map[string]*security.SecurityContext), } + // Add dummy security context + c.securityContexts[slurmReadProcCtx], err = security.NewSecurityContext( + slurmReadProcCtx, + nil, + readProcEnvirons, + c.logger, + ) + require.NoError(t, err) + // Add cgroups for i := range 20 { dir := fmt.Sprintf("%s/cpuacct/slurm/job_%d", cgroupsPath, i) err = os.MkdirAll(dir, 0o750) require.NoError(t, err) + + err = os.WriteFile( + dir+"/cgroup.procs", + []byte(fmt.Sprintf("%d\n", i)), + 0o600, + ) + require.NoError(t, err) } // Binds GPUs to first n jobs for igpu := range mockGPUDevs { + dir := fmt.Sprintf("%s/%d", procFS, igpu) + + err = os.MkdirAll(dir, 0o750) + require.NoError(t, err) + err = os.WriteFile( - fmt.Sprintf("%s/%d", gpuMapFilePath, igpu), - []byte(strconv.FormatInt(int64(igpu), 10)), + dir+"/environ", + []byte(strings.Join([]string{fmt.Sprintf("SLURM_JOB_ID=%d", igpu), fmt.Sprintf("SLURM_JOB_GPUS=%d", igpu)}, "\000")+"\000"), 0o600, ) require.NoError(t, err) } // Now call get metrics which should populate jobPropsCache - _, err = c.discoverCgroups() + _, err = c.jobMetrics() require.NoError(t, err) // Check if jobPropsCache has 20 jobs and GPU ordinals are correct @@ -270,7 +285,7 @@ func TestJobPropsCaching(t *testing.T) { } // Now call again get metrics which should populate jobPropsCache - _, err = c.discoverCgroups() + _, err = c.jobMetrics() require.NoError(t, err) // Check if jobPropsCache has only 15 jobs and GPU ordinals are empty diff --git a/pkg/collector/testdata/output/discoverer/e2e-test-discoverer-cgroupsv1-slurm-output.txt b/pkg/collector/testdata/output/discoverer/e2e-test-discoverer-cgroupsv1-slurm-output.txt index d44cc4ae..0149f272 100644 --- a/pkg/collector/testdata/output/discoverer/e2e-test-discoverer-cgroupsv1-slurm-output.txt +++ b/pkg/collector/testdata/output/discoverer/e2e-test-discoverer-cgroupsv1-slurm-output.txt @@ -1 +1 @@ -[{"targets":["1320003"],"labels":{"__process_commandline":"/gpfslocalsup/spack_soft/gromacs/2022.2/gcc-8.4.1-kblhs7pjrcqlgv675gejjjy7n3h6wz2n/bin/gmx_mpi mdrun -ntomp 10 -v -deffnm run10 -multidir 1/ 2/ 3/ 4/","__process_effective_uid":"1000","__process_exe":"/usr/bin/vim","__process_pid__":"46236","__process_real_uid":"1000","service_name":"1320003"}},{"targets":["1320003"],"labels":{"__process_commandline":"/gpfslocalsup/spack_soft/gromacs/2022.2/gcc-8.4.1-kblhs7pjrcqlgv675gejjjy7n3h6wz2n/bin/gmx_mpi mdrun -ntomp 10 -v -deffnm run10 -multidir 1/ 2/ 3/ 4/","__process_effective_uid":"1000","__process_exe":"/usr/bin/vim","__process_pid__":"46235","__process_real_uid":"1000","service_name":"1320003"}},{"targets":["4824887"],"labels":{"__process_commandline":"/gpfslocalsup/spack_soft/gromacs/2022.2/gcc-8.4.1-kblhs7pjrcqlgv675gejjjy7n3h6wz2n/bin/gmx_mpi mdrun -ntomp 10 -v -deffnm run10 -multidir 1/ 2/ 3/ 4/","__process_effective_uid":"1000","__process_exe":"/usr/bin/vim","__process_pid__":"46281","__process_real_uid":"1000","service_name":"4824887"}},{"targets":["4824887"],"labels":{"__process_commandline":"/gpfslocalsup/spack_soft/gromacs/2022.2/gcc-8.4.1-kblhs7pjrcqlgv675gejjjy7n3h6wz2n/bin/gmx_mpi mdrun -ntomp 10 -v -deffnm run10 -multidir 1/ 2/ 3/ 4/","__process_effective_uid":"1000","__process_exe":"/usr/bin/vim","__process_pid__":"46231","__process_real_uid":"1000","service_name":"4824887"}}] +[{"targets":["1009248"],"labels":{"__process_pid__":"46231","service_name":"1009248"}},{"targets":["1009248"],"labels":{"__process_pid__":"46281","service_name":"1009248"}},{"targets":["1009249"],"labels":{"__process_pid__":"46235","service_name":"1009249"}},{"targets":["1009249"],"labels":{"__process_pid__":"46236","service_name":"1009249"}},{"targets":["1009250"],"labels":{"__process_pid__":"26242","service_name":"1009250"}},{"targets":["1009250"],"labels":{"__process_pid__":"46233","service_name":"1009250"}}] diff --git a/pkg/collector/testdata/output/discoverer/e2e-test-discoverer-cgroupsv2-slurm-output.txt b/pkg/collector/testdata/output/discoverer/e2e-test-discoverer-cgroupsv2-slurm-output.txt index d44cc4ae..a2932a11 100644 --- a/pkg/collector/testdata/output/discoverer/e2e-test-discoverer-cgroupsv2-slurm-output.txt +++ b/pkg/collector/testdata/output/discoverer/e2e-test-discoverer-cgroupsv2-slurm-output.txt @@ -1 +1 @@ -[{"targets":["1320003"],"labels":{"__process_commandline":"/gpfslocalsup/spack_soft/gromacs/2022.2/gcc-8.4.1-kblhs7pjrcqlgv675gejjjy7n3h6wz2n/bin/gmx_mpi mdrun -ntomp 10 -v -deffnm run10 -multidir 1/ 2/ 3/ 4/","__process_effective_uid":"1000","__process_exe":"/usr/bin/vim","__process_pid__":"46236","__process_real_uid":"1000","service_name":"1320003"}},{"targets":["1320003"],"labels":{"__process_commandline":"/gpfslocalsup/spack_soft/gromacs/2022.2/gcc-8.4.1-kblhs7pjrcqlgv675gejjjy7n3h6wz2n/bin/gmx_mpi mdrun -ntomp 10 -v -deffnm run10 -multidir 1/ 2/ 3/ 4/","__process_effective_uid":"1000","__process_exe":"/usr/bin/vim","__process_pid__":"46235","__process_real_uid":"1000","service_name":"1320003"}},{"targets":["4824887"],"labels":{"__process_commandline":"/gpfslocalsup/spack_soft/gromacs/2022.2/gcc-8.4.1-kblhs7pjrcqlgv675gejjjy7n3h6wz2n/bin/gmx_mpi mdrun -ntomp 10 -v -deffnm run10 -multidir 1/ 2/ 3/ 4/","__process_effective_uid":"1000","__process_exe":"/usr/bin/vim","__process_pid__":"46281","__process_real_uid":"1000","service_name":"4824887"}},{"targets":["4824887"],"labels":{"__process_commandline":"/gpfslocalsup/spack_soft/gromacs/2022.2/gcc-8.4.1-kblhs7pjrcqlgv675gejjjy7n3h6wz2n/bin/gmx_mpi mdrun -ntomp 10 -v -deffnm run10 -multidir 1/ 2/ 3/ 4/","__process_effective_uid":"1000","__process_exe":"/usr/bin/vim","__process_pid__":"46231","__process_real_uid":"1000","service_name":"4824887"}}] +[{"targets":["1009248"],"labels":{"__process_pid__":"46231","service_name":"1009248"}},{"targets":["1009248"],"labels":{"__process_pid__":"46281","service_name":"1009248"}},{"targets":["1009248"],"labels":{"__process_pid__":"3346567","service_name":"1009248"}},{"targets":["1009248"],"labels":{"__process_pid__":"3346596","service_name":"1009248"}},{"targets":["1009248"],"labels":{"__process_pid__":"3346674","service_name":"1009248"}},{"targets":["1009249"],"labels":{"__process_pid__":"46235","service_name":"1009249"}},{"targets":["1009249"],"labels":{"__process_pid__":"46236","service_name":"1009249"}},{"targets":["1009249"],"labels":{"__process_pid__":"3346567","service_name":"1009249"}},{"targets":["1009249"],"labels":{"__process_pid__":"46233","service_name":"1009249"}},{"targets":["1009250"],"labels":{"__process_pid__":"26242","service_name":"1009250"}},{"targets":["1009250"],"labels":{"__process_pid__":"46233","service_name":"1009250"}},{"targets":["1009250"],"labels":{"__process_pid__":"3346567","service_name":"1009250"}},{"targets":["1009250"],"labels":{"__process_pid__":"3346596","service_name":"1009250"}},{"targets":["1009250"],"labels":{"__process_pid__":"3346674","service_name":"1009250"}}] diff --git a/pkg/collector/testdata/output/exporter/e2e-test-cgroupsv2-nvidia-gpu-reordering.txt b/pkg/collector/testdata/output/exporter/e2e-test-cgroupsv2-nvidia-gpu-reordering.txt index e2fa85c8..dfe2af4c 100644 --- a/pkg/collector/testdata/output/exporter/e2e-test-cgroupsv2-nvidia-gpu-reordering.txt +++ b/pkg/collector/testdata/output/exporter/e2e-test-cgroupsv2-nvidia-gpu-reordering.txt @@ -108,20 +108,20 @@ ceems_rapl_package_power_limit_watts_total{hostname="",index="0",path="pkg/colle ceems_rapl_package_power_limit_watts_total{hostname="",index="1",path="pkg/collector/testdata/sys/class/powercap/intel-rapl:1"} 180 # HELP ceems_rdma_cqe_len_active Length of active CQs # TYPE ceems_rdma_cqe_len_active gauge -ceems_rdma_cqe_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1320003"} 8190 -ceems_rdma_cqe_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="4824887"} 8190 +ceems_rdma_cqe_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009248"} 8190 +ceems_rdma_cqe_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009249"} 8190 # HELP ceems_rdma_cqs_active Number of active CQs # TYPE ceems_rdma_cqs_active gauge -ceems_rdma_cqs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1320003"} 2 -ceems_rdma_cqs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="4824887"} 2 +ceems_rdma_cqs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009248"} 2 +ceems_rdma_cqs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009249"} 2 # HELP ceems_rdma_mrs_active Number of active MRs # TYPE ceems_rdma_mrs_active gauge -ceems_rdma_mrs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1320003"} 2 -ceems_rdma_mrs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="4824887"} 2 +ceems_rdma_mrs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009248"} 2 +ceems_rdma_mrs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009249"} 2 # HELP ceems_rdma_mrs_len_active Length of active MRs # TYPE ceems_rdma_mrs_len_active gauge -ceems_rdma_mrs_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1320003"} 4.194304e+06 -ceems_rdma_mrs_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="4824887"} 4.194304e+06 +ceems_rdma_mrs_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009248"} 4.194304e+06 +ceems_rdma_mrs_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009249"} 4.194304e+06 # HELP ceems_rdma_port_data_received_bytes_total Number of data octets received on all links # TYPE ceems_rdma_port_data_received_bytes_total counter ceems_rdma_port_data_received_bytes_total{device="hfi1_0",hostname="",manager="slurm",port="1"} 1.380366808104e+12 @@ -148,8 +148,8 @@ ceems_rdma_port_packets_transmitted_total{device="mlx4_0",hostname="",manager="s ceems_rdma_port_packets_transmitted_total{device="mlx5_0",hostname="",manager="slurm",port="1"} 1.0907922116e+10 # HELP ceems_rdma_qps_active Number of active QPs # TYPE ceems_rdma_qps_active gauge -ceems_rdma_qps_active{device="mlx5_0",hostname="",manager="slurm",port="1",uuid="1320003"} 16 -ceems_rdma_qps_active{device="mlx5_0",hostname="",manager="slurm",port="1",uuid="4824887"} 16 +ceems_rdma_qps_active{device="mlx5_0",hostname="",manager="slurm",port="1",uuid="1009248"} 16 +ceems_rdma_qps_active{device="mlx5_0",hostname="",manager="slurm",port="1",uuid="1009249"} 16 # HELP ceems_rdma_state_id State of the InfiniBand port (0: no change, 1: down, 2: init, 3: armed, 4: active, 5: act defer) # TYPE ceems_rdma_state_id gauge ceems_rdma_state_id{device="hfi1_0",hostname="",manager="slurm",port="1"} 4 diff --git a/pkg/collector/testdata/output/exporter/e2e-test-cgroupsv2-nvidia-ipmiutil-output.txt b/pkg/collector/testdata/output/exporter/e2e-test-cgroupsv2-nvidia-ipmiutil-output.txt index d10c1e07..33800061 100644 --- a/pkg/collector/testdata/output/exporter/e2e-test-cgroupsv2-nvidia-ipmiutil-output.txt +++ b/pkg/collector/testdata/output/exporter/e2e-test-cgroupsv2-nvidia-ipmiutil-output.txt @@ -108,20 +108,20 @@ ceems_rapl_package_power_limit_watts_total{hostname="",index="0",path="pkg/colle ceems_rapl_package_power_limit_watts_total{hostname="",index="1",path="pkg/collector/testdata/sys/class/powercap/intel-rapl:1"} 180 # HELP ceems_rdma_cqe_len_active Length of active CQs # TYPE ceems_rdma_cqe_len_active gauge -ceems_rdma_cqe_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1320003"} 8190 -ceems_rdma_cqe_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="4824887"} 8190 +ceems_rdma_cqe_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009248"} 8190 +ceems_rdma_cqe_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009249"} 8190 # HELP ceems_rdma_cqs_active Number of active CQs # TYPE ceems_rdma_cqs_active gauge -ceems_rdma_cqs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1320003"} 2 -ceems_rdma_cqs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="4824887"} 2 +ceems_rdma_cqs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009248"} 2 +ceems_rdma_cqs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009249"} 2 # HELP ceems_rdma_mrs_active Number of active MRs # TYPE ceems_rdma_mrs_active gauge -ceems_rdma_mrs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1320003"} 2 -ceems_rdma_mrs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="4824887"} 2 +ceems_rdma_mrs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009248"} 2 +ceems_rdma_mrs_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009249"} 2 # HELP ceems_rdma_mrs_len_active Length of active MRs # TYPE ceems_rdma_mrs_len_active gauge -ceems_rdma_mrs_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1320003"} 4.194304e+06 -ceems_rdma_mrs_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="4824887"} 4.194304e+06 +ceems_rdma_mrs_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009248"} 4.194304e+06 +ceems_rdma_mrs_len_active{device="mlx5_0",hostname="",manager="slurm",port="",uuid="1009249"} 4.194304e+06 # HELP ceems_rdma_port_data_received_bytes_total Number of data octets received on all links # TYPE ceems_rdma_port_data_received_bytes_total counter ceems_rdma_port_data_received_bytes_total{device="hfi1_0",hostname="",manager="slurm",port="1"} 1.380366808104e+12 @@ -148,8 +148,8 @@ ceems_rdma_port_packets_transmitted_total{device="mlx4_0",hostname="",manager="s ceems_rdma_port_packets_transmitted_total{device="mlx5_0",hostname="",manager="slurm",port="1"} 1.0907922116e+10 # HELP ceems_rdma_qps_active Number of active QPs # TYPE ceems_rdma_qps_active gauge -ceems_rdma_qps_active{device="mlx5_0",hostname="",manager="slurm",port="1",uuid="1320003"} 16 -ceems_rdma_qps_active{device="mlx5_0",hostname="",manager="slurm",port="1",uuid="4824887"} 16 +ceems_rdma_qps_active{device="mlx5_0",hostname="",manager="slurm",port="1",uuid="1009248"} 16 +ceems_rdma_qps_active{device="mlx5_0",hostname="",manager="slurm",port="1",uuid="1009249"} 16 # HELP ceems_rdma_state_id State of the InfiniBand port (0: no change, 1: down, 2: init, 3: armed, 4: active, 5: act defer) # TYPE ceems_rdma_state_id gauge ceems_rdma_state_id{device="hfi1_0",hostname="",manager="slurm",port="1"} 4 diff --git a/pkg/collector/testdata/proc.ttar b/pkg/collector/testdata/proc.ttar index afa31d32..5bd7ae93 100644 --- a/pkg/collector/testdata/proc.ttar +++ b/pkg/collector/testdata/proc.ttar @@ -1559,7 +1559,7 @@ SymlinkTo: /usr/bin # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: proc/26242/environ Lines: 1 -PATH=/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binNULLBYTEHOSTNAME=cd24e11f73a5NULLBYTETERM=xtermNULLBYTEGOLANG_VERSION=1.12.5NULLBYTEGOPATH=/goNULLBYTEHOME=/rootNULLBYTESLURM_JOB_UID=1000NULLBYTESLURM_JOB_ID=11000NULLBYTESLURM_JOB_ACCOUNT=testaccNULLBYTESLURM_JOB_NODELIST=compute-[0-2]NULLBYTESLURM_STEP_GPUS=1NULLBYTE +PATH=/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binNULLBYTEHOSTNAME=cd24e11f73a5NULLBYTETERM=xtermNULLBYTEGOLANG_VERSION=1.12.5NULLBYTEGOPATH=/goNULLBYTEHOME=/rootNULLBYTESLURM_JOB_UID=1000NULLBYTESLURM_JOB_ID=1009250NULLBYTESLURM_JOB_ACCOUNT=testaccNULLBYTESLURM_JOB_NODELIST=compute-[0-2]NULLBYTESLURM_STEP_GPUS=1NULLBYTE Mode: 644 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: proc/26242/exe @@ -7159,7 +7159,7 @@ SymlinkTo: /usr/bin # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: proc/46235/environ Lines: 1 -PATH=/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binNULLBYTEHOSTNAME=cd24e11f73a5NULLBYTETERM=xtermNULLBYTEGOLANG_VERSION=1.12.5NULLBYTEGOPATH=/goNULLBYTEHOME=/rootNULLBYTEENABLE_PROFILING=1NULLBYTE +PATH=/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binNULLBYTEHOSTNAME=cd24e11f73a5NULLBYTETERM=xtermNULLBYTEGOLANG_VERSION=1.12.5NULLBYTEGOPATH=/goNULLBYTEHOME=/rootNULLBYTEENABLE_PROFILING=1NULLBYTESLURM_STEP_GPUS=0NULLBYTESLURM_JOB_ID=1009249NULLBYTE Mode: 644 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: proc/46235/exe @@ -8550,7 +8550,7 @@ SymlinkTo: /usr/bin # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: proc/46281/environ Lines: 1 -PATH=/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binNULLBYTEHOSTNAME=cd24e11f73a5NULLBYTETERM=xtermNULLBYTEGOLANG_VERSION=1.12.5NULLBYTEGOPATH=/goNULLBYTEHOME=/rootNULLBYTEENABLE_PROFILING=1NULLBYTE +PATH=/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binNULLBYTEHOSTNAME=cd24e11f73a5NULLBYTETERM=xtermNULLBYTEGOLANG_VERSION=1.12.5NULLBYTEGOPATH=/goNULLBYTEHOME=/rootNULLBYTEENABLE_PROFILING=1NULLBYTESLURM_JOB_GPUS=2,3NULLBYTESLURM_JOB_ID=1009248NULLBYTE Mode: 644 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: proc/46281/exe diff --git a/pkg/collector/testdata/sys.ttar b/pkg/collector/testdata/sys.ttar index e0a407e7..6c95ffee 100644 --- a/pkg/collector/testdata/sys.ttar +++ b/pkg/collector/testdata/sys.ttar @@ -3134,12 +3134,9 @@ Lines: 1 Mode: 644 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/cpuacct/slurm/uid_1000/job_1009248/step_0/cgroup.procs -Lines: 5 -9544 -9562 -9563 -9616 -9870 +Lines: 2 +46231 +46281 Mode: 644 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/cpuacct/slurm/uid_1000/job_1009248/step_0/cpu.cfs_period_us @@ -3450,12 +3447,9 @@ Lines: 1 Mode: 644 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/cpuacct/slurm/uid_1000/job_1009249/step_0/cgroup.procs -Lines: 5 -9544 -9562 -9563 -9616 -9870 +Lines: 2 +46235 +46236 Mode: 644 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/cpuacct/slurm/uid_1000/job_1009249/step_0/cpu.cfs_period_us @@ -3766,12 +3760,9 @@ Lines: 1 Mode: 644 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/cpuacct/slurm/uid_1000/job_1009250/step_0/cgroup.procs -Lines: 5 -9544 -9562 -9563 -9616 -9870 +Lines: 2 +26242 +46233 Mode: 644 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/cpuacct/slurm/uid_1000/job_1009250/step_0/cpu.cfs_period_us @@ -8245,12 +8236,9 @@ Lines: 1 Mode: 644 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/memory/slurm/uid_1000/job_1009248/step_0/cgroup.procs -Lines: 5 -9544 -9562 -9563 -9616 -9870 +Lines: 2 +46231 +46281 Mode: 644 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/memory/slurm/uid_1000/job_1009248/step_0/memory.failcnt @@ -8633,12 +8621,9 @@ Lines: 1 Mode: 644 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/memory/slurm/uid_1000/job_1009249/step_0/cgroup.procs -Lines: 5 -9544 -9562 -9563 -9616 -9870 +Lines: 2 +46235 +46236 Mode: 644 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/memory/slurm/uid_1000/job_1009249/step_0/memory.failcnt @@ -9021,12 +9006,9 @@ Lines: 1 Mode: 644 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/memory/slurm/uid_1000/job_1009250/step_0/cgroup.procs -Lines: 5 -9544 -9562 -9563 -9616 -9870 +Lines: 2 +26242 +46233 Mode: 644 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/memory/slurm/uid_1000/job_1009250/step_0/memory.failcnt @@ -10200,7 +10182,7 @@ Mode: 640 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/system.slice/slurmstepd.scope/job_1009248/step_3/slurm/cgroup.procs Lines: 1 -3401149 +46282 Mode: 640 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/system.slice/slurmstepd.scope/job_1009248/step_3/slurm/cgroup.stat @@ -10723,8 +10705,8 @@ Mode: 640 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/system.slice/slurmstepd.scope/job_1009248/step_3/user/task_0/cgroup.procs Lines: 2 -3401154 -3401163 +46231 +46281 Mode: 640 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/system.slice/slurmstepd.scope/job_1009248/step_3/user/task_0/cgroup.stat @@ -14119,8 +14101,8 @@ Mode: 640 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/system.slice/slurmstepd.scope/job_1009249/step_3/user/task_0/cgroup.procs Lines: 2 -3401154 -3401163 +46235 +46236 Mode: 640 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/system.slice/slurmstepd.scope/job_1009249/step_3/user/task_0/cgroup.stat @@ -15161,11 +15143,8 @@ max Mode: 640 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/system.slice/slurmstepd.scope/job_1009249/step_batch/user/task_0/cgroup.procs -Lines: 4 -3346596 -3346674 -3401141 -3401142 +Lines: 1 +46233 Mode: 640 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/system.slice/slurmstepd.scope/job_1009249/step_batch/user/task_0/cgroup.stat @@ -17511,8 +17490,8 @@ Mode: 640 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/system.slice/slurmstepd.scope/job_1009250/step_3/user/task_0/cgroup.procs Lines: 2 -3401154 -3401163 +26242 +46233 Mode: 640 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Path: sys/fs/cgroup/system.slice/slurmstepd.scope/job_1009250/step_3/user/task_0/cgroup.stat diff --git a/scripts/e2e-test.sh b/scripts/e2e-test.sh index 41646bd7..132212a4 100755 --- a/scripts/e2e-test.sh +++ b/scripts/e2e-test.sh @@ -354,7 +354,6 @@ then --collector.slurm \ --collector.gpu.type="nvidia" \ --collector.gpu.nvidia-smi-path="pkg/collector/testdata/nvidia-smi" \ - --collector.slurm.gpu-job-map-path="pkg/collector/testdata/gpujobmap" \ --collector.ipmi_dcmi.test-mode \ --collector.ipmi_dcmi.cmd="pkg/collector/testdata/ipmi/freeipmi/ipmi-dcmi" \ --collector.empty-hostname-label \ @@ -373,7 +372,6 @@ then --collector.slurm \ --collector.gpu.type="nvidia" \ --collector.gpu.nvidia-smi-path="pkg/collector/testdata/nvidia-smi" \ - --collector.slurm.gpu-job-map-path="pkg/collector/testdata/gpujobmap" \ --collector.ipmi_dcmi.cmd="pkg/collector/testdata/ipmi/freeipmi/ipmi-dcmi" \ --collector.ipmi_dcmi.test-mode \ --collector.empty-hostname-label \ @@ -391,7 +389,6 @@ then --collector.slurm \ --collector.gpu.type="nvidia" \ --collector.gpu.nvidia-smi-path="pkg/collector/testdata/nvidia-smi" \ - --collector.slurm.gpu-job-map-path="pkg/collector/testdata/gpujobmap" \ --collector.rdma.stats \ --collector.rdma.cmd="pkg/collector/testdata/rdma" \ --collector.empty-hostname-label \ @@ -411,7 +408,6 @@ then --collector.slurm.gpu-order-map="0:0,1:1,2:4,3:5,4:2.1,5:2.5,6:2.13,7:3.1,8:3.5,9:3.13,10:6,11:7" \ --collector.gpu.type="nvidia" \ --collector.gpu.nvidia-smi-path="pkg/collector/testdata/nvidia-smi" \ - --collector.slurm.gpu-job-map-path="pkg/collector/testdata/gpujobmap" \ --collector.rdma.stats \ --collector.rdma.cmd="pkg/collector/testdata/rdma" \ --collector.empty-hostname-label \ @@ -430,7 +426,6 @@ then --collector.slurm \ --collector.gpu.type="amd" \ --collector.gpu.rocm-smi-path="pkg/collector/testdata/rocm-smi" \ - --collector.slurm.gpu-job-map-path="pkg/collector/testdata/gpujobmap" \ --collector.empty-hostname-label \ --collector.ipmi_dcmi.test-mode \ --web.listen-address "127.0.0.1:${port}" \ @@ -478,7 +473,6 @@ then --collector.slurm \ --collector.gpu.type="amd" \ --collector.gpu.rocm-smi-path="pkg/collector/testdata/rocm-smi" \ - --collector.slurm.gpu-job-map-path="pkg/collector/testdata/gpujobmap" \ --collector.slurm.swap.memory.metrics \ --collector.slurm.psi.metrics \ --collector.ipmi.dcmi.cmd="pkg/collector/testdata/ipmi/capmc/capmc" \ diff --git a/website/docs/configuration/ceems-exporter.md b/website/docs/configuration/ceems-exporter.md index cb61173b..c3d138c8 100644 --- a/website/docs/configuration/ceems-exporter.md +++ b/website/docs/configuration/ceems-exporter.md @@ -30,21 +30,14 @@ from CLI arguments as briefed below: Although fetching metrics from cgroups do not need any additional privileges, getting GPU ordinal to job ID needs extra privileges. This is due to the fact that this -information is not readily available in cgroups. Currently, the exporter supports two different -ways to get the GPU ordinals to job ID map. - -- Reading environment variables `SLURM_STEP_GPUS` and/or `SLURM_JOB_GPUS` of job from -`/proc` file system which contains GPU ordinal numbers of job. -- Use prolog and epilog scripts to get the GPU to job ID map. Example prolog script -is provided in the [repo](https://github.com/mahendrapaipuri/ceems/tree/main/etc/slurm). - -We recommend to use the first approach as it requires minimum configuration to maintain -for the operators. The downside is that the CEEMS exporter process will need some -privileges to be able to read the environment variables in `/proc` file system. The -privileges can be set in different ways and it is discussed in [Security](./security.md) -section. - -On the other hand, if the operators do not wish to add any privileges to exporter +information is not readily available in cgroups. Currently, the exporter gets this +information by reading environment variables `SLURM_STEP_GPUS` and/or `SLURM_JOB_GPUS` +of job from `/proc` file system which contains GPU ordinal numbers of job. CEEMS exporter +process will need some privileges to be able to read the environment variables in `/proc` +file system. The privileges can be set in different ways and it is discussed in +[Security](./security.md) section. + + When compute nodes uses a mix of full physical GPUs and MIG instances (NVIDIA), the ordering of GPUs by SLURM is undefined and it can depend on how the compute nodes are diff --git a/website/docs/configuration/grafana.md b/website/docs/configuration/grafana.md index 96294143..51337e35 100644 --- a/website/docs/configuration/grafana.md +++ b/website/docs/configuration/grafana.md @@ -81,35 +81,41 @@ datasources: - name: SLURM-TSDB type: prometheus access: proxy - # Notice the path parameter `slurm-0` at the end. - # IT IS IMPORTANT TO HAVE IT - url: http://ceems-lb:9030/slurm-0 + url: http://ceems-lb:9030 basicAuth: true basicAuthUser: + # Notice we are setting custom header X-Ceems-Cluster-Id to `slurm-0`. + # IT IS IMPORTANT TO HAVE IT + jsonData: + httpHeaderName1: X-Ceems-Cluster-Id secureJsonData: basicAuthPassword: + httpHeaderValue1: slurm-0 - name: OS-TSDB type: prometheus access: proxy - # Notice the path parameter `os-0` at the end. - # IT IS IMPORTANT TO HAVE IT - url: http://ceems-lb:9030/os-0 + url: http://ceems-lb:9030 basicAuth: true basicAuthUser: + # Notice we are setting custom header X-Ceems-Cluster-Id to `slurm-0`. + # IT IS IMPORTANT TO HAVE IT + jsonData: + httpHeaderName1: X-Ceems-Cluster-Id secureJsonData: basicAuthPassword: + httpHeaderValue1: os-0 ``` -Internally, CEEMS LB will strip the path parameter and forwards the request -to the correct backends group based on the provided path parameter. This ensures +Internally, CEEMS LB will check the header `X-Ceems-Cluster-Id` and forwards the request +to the correct backends group based on the provided cluster ID. This ensures that we can use a single instance of CEEMS LB to load balance across multiple clusters. :::important[IMPORTANT] Even if there is only one cluster and one TSDB instance for that cluster, we need -to configure the datasource on Grafana as explained above if we wish to use +to configure the datasource on Grafana with custom header as explained above if we wish to use CEEMS LB. This is the only way for the CEEMS LB to know which cluster to target. ::: diff --git a/website/docs/configuration/prometheus.md b/website/docs/configuration/prometheus.md index 7db1ee75..66fb28af 100644 --- a/website/docs/configuration/prometheus.md +++ b/website/docs/configuration/prometheus.md @@ -16,11 +16,9 @@ contains NVIDIA GPUs: scrape_configs: - job_name: "gpu-node-group" metric_relabel_configs: - - source_labels: [UUID] - regex: (.*) + - source_labels: [UUID,GPU_I_ID] + separator: '/' target_label: gpuuuid - replacement: $1 - action: replace - regex: UUID action: labeldrop - regex: modelName @@ -29,7 +27,7 @@ scrape_configs: - targets: ["http://gpu-0:9400", "http://gpu-1:9400", ...] ``` -The `metric_relabel_configs` is replacing the label `UUID` which is -the UUID of GPU with `gpuuuid` which is compatible with CEEMS -exporter. Moreover the config also drops unused `UUID` and `modelName` -labels to reduce storage and cardinality. +The `metric_relabel_configs` is merges labels `UUID` and `GPU_I_ID` which are +the UUID and MIG instance ID of GPU, respectively and sets it to `gpuuuid` +which is compatible with CEEMS exporter. Moreover the config also drops unused +`UUID` and `modelName` labels to reduce storage and cardinality.