Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mem per node setting in ReFrame configuration template #405

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions reframe_config_bot.py.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@ site_configuration = {
'options': ['--mem={size}'],
}
],
'extras': {
# Make sure to round down, otherwise a job might ask for more mem than is available
# per node
'mem_per_node': __MEM_PER_NODE__,
},
'max_jobs': 1
}
]
Expand Down
10 changes: 9 additions & 1 deletion test_suite.sh
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ export RFM_PREFIX=$PWD/reframe_runs
echo "Configured reframe with the following environment variables:"
env | grep "RFM_"

# Inject correct CPU properties into the ReFrame config file
# Inject correct CPU/memory properties into the ReFrame config file
cpuinfo=$(lscpu)
if [[ "${cpuinfo}" =~ CPU\(s\):[^0-9]*([0-9]+) ]]; then
cpu_count=${BASH_REMATCH[1]}
Expand Down Expand Up @@ -165,11 +165,19 @@ if [[ "${cpuinfo}" =~ (Core\(s\) per socket:[^0-9]*([0-9]+)) ]]; then
else
fatal_error "Failed to get the number of cores per socket for the current test hardware with lscpu."
fi
cgroup_mem_bytes=$(cat /sys/fs/cgroup/memory/slurm/uid_${UID}/job_${SLURM_JOB_ID}/memory.limit_in_bytes)
if [[ $? -eq 0 ]]; then
# Convert to MiB
cgroup_mem_mib=$((cgroup_mem_bytes/(1024*1024)))
else
fatal_error "Failed to get the memory limit in bytes from the current cgroup"
fi
cp ${RFM_CONFIG_FILE_TEMPLATE} ${RFM_CONFIG_FILES}
sed -i "s/__NUM_CPUS__/${cpu_count}/g" $RFM_CONFIG_FILES
sed -i "s/__NUM_SOCKETS__/${socket_count}/g" $RFM_CONFIG_FILES
sed -i "s/__NUM_CPUS_PER_CORE__/${threads_per_core}/g" $RFM_CONFIG_FILES
sed -i "s/__NUM_CPUS_PER_SOCKET__/${cores_per_socket}/g" $RFM_CONFIG_FILES
sed -i "s/__MEM_PER_NODE__/${cgroup_mem_mib}/g" $RFM_CONFIG_FILES

# Workaround for https://github.com/EESSI/software-layer/pull/467#issuecomment-1973341966
export PSM3_DEVICES='self,shm' # this is enough, since we only run single node for now
Expand Down