Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring CUDA compat library support in line with #212 #235

Closed
wants to merge 57 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
a86c614
Add CUDA support to software_layer
ocaisa Dec 16, 2022
6c41b26
singularity install does not seem to install mksquashfs
ocaisa Dec 16, 2022
7d53b03
Trigger script test
ocaisa Dec 16, 2022
58357b9
Revert
ocaisa Dec 16, 2022
4b6654d
Use the right package name for squash-fs
ocaisa Dec 16, 2022
33ce584
Tidy up hooks
ocaisa Dec 16, 2022
f1cd893
Force creation of links
ocaisa Dec 16, 2022
06a9eaf
Install host_injections CUDA
ocaisa Dec 19, 2022
b4e80a1
Move comments to the right place
ocaisa Dec 19, 2022
2c86973
Reimplement `mkdir -p` reporting where permissions break down
ocaisa Feb 14, 2023
3909080
Merge branch 'main' into p7zip
ocaisa Feb 24, 2023
85c805c
Merge branch 'ocaisa-patch-2' into p7zip
ocaisa Feb 24, 2023
9590047
Be more agressive on catching errors
ocaisa Feb 24, 2023
1357f76
`${extra_args}` is actually multiple args not a single string
ocaisa Feb 27, 2023
8096c54
Update EESSI-pilot-install-software.sh
ocaisa Feb 27, 2023
ec31edf
Catching echo exit code instead of actual code
ocaisa Mar 1, 2023
0e99db5
Give a full path to the CUDA host injections script
ocaisa Mar 1, 2023
cd11792
Add checks for some whitelist entries for CUDA
ocaisa Mar 1, 2023
f514f81
Fix failing eb installation
ocaisa Mar 1, 2023
be326a1
Make sure we check space in the right places
ocaisa Mar 1, 2023
87c17a3
Merge branch 'main' of github.com:eessi/software-layer into p7zip
ocaisa Mar 2, 2023
a9cc56c
Bring GPU support in line with #212
ocaisa Mar 2, 2023
103f5fa
Simply wrap `mkdir -p` for better error reporting
ocaisa Mar 3, 2023
f02e5f6
Merge branch 'p7zip' of github.com:ocaisa/software-layer into p7zip
ocaisa Mar 3, 2023
793ba29
Simply wrap `mkdir -p` for better error reporting
ocaisa Mar 3, 2023
c0a1247
Make CUDA version a variable
ocaisa Mar 3, 2023
5e82923
Use TOPDIR, be more descriptive
ocaisa Mar 3, 2023
8384b25
Add missing argument
ocaisa Mar 3, 2023
7bb4a0b
Merge branch 'p7zip' into cuda_compat
ocaisa Mar 3, 2023
44de61c
Reuse utils.sh
ocaisa Mar 3, 2023
98fe2a7
Improve error messages in new bash function
ocaisa Mar 3, 2023
bbe7df2
Stick with return_code
ocaisa Mar 3, 2023
95dc245
Use realpath to be consistent with other scripts
ocaisa Mar 3, 2023
a1270f2
Wrong realpath flag
ocaisa Mar 3, 2023
aba486d
Wrong realpath flag
ocaisa Mar 3, 2023
6d17e5a
Merge branch 'p7zip' into cuda_compat
ocaisa Mar 3, 2023
d2d1fc3
Fix typo
ocaisa Mar 3, 2023
b03445e
Merge branch 'p7zip' into cuda_compat
ocaisa Mar 3, 2023
562e94b
Always add the rebuild option if we get to the point where we actuall…
ocaisa Mar 3, 2023
b4ae5f0
Expose CUDA_TEMP_DIR
ocaisa Mar 3, 2023
0d87101
Merge branch 'p7zip' into cuda_compat
ocaisa Mar 3, 2023
e7728d7
Update test script
ocaisa Mar 3, 2023
c122241
Polish a lot of the compat scripts
ocaisa Mar 3, 2023
e91423f
Make scripts executable
ocaisa Mar 3, 2023
1868021
Fix shebangs
ocaisa Mar 3, 2023
92492d3
Fix path to utils
ocaisa Mar 3, 2023
1e145a4
Fix path to init scripts.
ocaisa Mar 3, 2023
0c2b6c0
Don't worry too much about args to float_greater_than
ocaisa Mar 3, 2023
080f6be
Switch loop to use for
ocaisa Mar 3, 2023
3f41838
Test before believing that compat libs work
ocaisa Mar 3, 2023
b10f1e1
Tweaks
ocaisa Mar 3, 2023
31c5549
Add check for prefix shell
ocaisa Mar 3, 2023
a736122
Add check for prefix shell
ocaisa Mar 3, 2023
1ae6d44
Tweak docs
ocaisa Mar 3, 2023
3445196
Make scripts pass shellcheck
ocaisa Mar 4, 2023
a8ce967
Only use curl, also be careful about deliberate splitting
ocaisa Mar 5, 2023
9c2d267
Allow splitting when required
ocaisa Mar 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 82 additions & 32 deletions EESSI-pilot-install-software.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ while [[ $# -gt 0 ]]; do
export https_proxy="$2"
shift 2
;;
-*|--*)
-*)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second matching is redundant

echo "Error: Unknown option: $1" >&2
exit 1
;;
Expand All @@ -48,12 +48,12 @@ done

set -- "${POSITIONAL_ARGS[@]}"

TOPDIR=$(dirname $(realpath $0))
TOPDIR=$(dirname "$(realpath "$0")")

source $TOPDIR/scripts/utils.sh
source "$TOPDIR"/scripts/utils.sh

# honor $TMPDIR if it is already defined, use /tmp otherwise
if [ -z $TMPDIR ]; then
if [ -z "$TMPDIR" ]; then
export WORKDIR=/tmp/$USER
else
export WORKDIR=$TMPDIR/$USER
Expand All @@ -63,20 +63,16 @@ TMPDIR=$(mktemp -d)

echo ">> Setting up environment..."

source $TOPDIR/init/minimal_eessi_env
source "$TOPDIR"/init/minimal_eessi_env

if [ -d $EESSI_CVMFS_REPO ]; then
if [ -d "$EESSI_CVMFS_REPO" ]; then
echo_green "$EESSI_CVMFS_REPO available, OK!"
else
fatal_error "$EESSI_CVMFS_REPO is not available!"
fi

# make sure we're in Prefix environment by checking $SHELL
if [[ ${SHELL} = ${EPREFIX}/bin/bash ]]; then
echo_green ">> It looks like we're in a Gentoo Prefix environment, good!"
else
fatal_error "Not running in Gentoo Prefix environment, run '${EPREFIX}/startprefix' first!"
fi
check_in_prefix_shell

# avoid that pyc files for EasyBuild are stored in EasyBuild installation directory
export PYTHONPYCACHEPREFIX=$TMPDIR/pycache
Expand All @@ -92,8 +88,10 @@ if [[ "$EASYBUILD_OPTARCH" == "GENERIC" ]]; then
fi

echo ">> Determining software subdirectory to use for current build host..."
if [ -z $EESSI_SOFTWARE_SUBDIR_OVERRIDE ]; then
export EESSI_SOFTWARE_SUBDIR_OVERRIDE=$(python3 $TOPDIR/eessi_software_subdir.py $DETECTION_PARAMETERS)
if [ -z "$EESSI_SOFTWARE_SUBDIR_OVERRIDE" ]; then
# shellcheck disable=SC2086
EESSI_SOFTWARE_SUBDIR_OVERRIDE=$(python3 "$TOPDIR"/eessi_software_subdir.py $DETECTION_PARAMETERS)
export EESSI_SOFTWARE_SUBDIR_OVERRIDE
echo ">> Determined \$EESSI_SOFTWARE_SUBDIR_OVERRIDE via 'eessi_software_subdir.py $DETECTION_PARAMETERS' script"
else
echo ">> Picking up pre-defined \$EESSI_SOFTWARE_SUBDIR_OVERRIDE: ${EESSI_SOFTWARE_SUBDIR_OVERRIDE}"
Expand All @@ -102,7 +100,7 @@ fi
# Set all the EESSI environment variables (respecting $EESSI_SOFTWARE_SUBDIR_OVERRIDE)
# $EESSI_SILENT - don't print any messages
# $EESSI_BASIC_ENV - give a basic set of environment variables
EESSI_SILENT=1 EESSI_BASIC_ENV=1 source $TOPDIR/init/eessi_environment_variables
EESSI_SILENT=1 EESSI_BASIC_ENV=1 source "$TOPDIR"/init/eessi_environment_variables

if [[ -z ${EESSI_SOFTWARE_SUBDIR} ]]; then
fatal_error "Failed to determine software subdirectory?!"
Expand All @@ -113,24 +111,25 @@ else
fi

echo ">> Initializing Lmod..."
source $EPREFIX/usr/share/Lmod/init/bash
source "$EPREFIX"/usr/share/Lmod/init/bash
ml_version_out=$TMPDIR/ml.out
ml --version &> $ml_version_out
ml --version &> "$ml_version_out"
# shellcheck disable=SC2181
if [[ $? -eq 0 ]]; then
echo_green ">> Found Lmod ${LMOD_VERSION}"
else
fatal_error "Failed to initialize Lmod?! (see output in ${ml_version_out}"
fi

echo ">> Configuring EasyBuild..."
source $TOPDIR/configure_easybuild
source "$TOPDIR"/configure_easybuild

echo ">> Setting up \$MODULEPATH..."
# make sure no modules are loaded
module --force purge
# ignore current $MODULEPATH entirely
module unuse $MODULEPATH
module use $EASYBUILD_INSTALLPATH/modules/all
module unuse "$MODULEPATH"
module use "$EASYBUILD_INSTALLPATH"/modules/all
if [[ -z ${MODULEPATH} ]]; then
fatal_error "Failed to set up \$MODULEPATH?!"
else
Expand All @@ -141,7 +140,8 @@ REQ_EB_VERSION='4.5.0'

echo ">> Checking for EasyBuild module..."
ml_av_easybuild_out=$TMPDIR/ml_av_easybuild.out
module avail 2>&1 | grep -i easybuild/${REQ_EB_VERSION} &> ${ml_av_easybuild_out}
module avail 2>&1 | grep -i easybuild/${REQ_EB_VERSION} &> "${ml_av_easybuild_out}"
# shellcheck disable=SC2181
if [[ $? -eq 0 ]]; then
echo_green ">> EasyBuild module found!"
else
Expand All @@ -150,19 +150,20 @@ else
EB_TMPDIR=${TMPDIR}/ebtmp
echo ">> Temporary installation (in ${EB_TMPDIR})..."
pip_install_out=${TMPDIR}/pip_install.out
pip3 install --prefix $EB_TMPDIR easybuild &> ${pip_install_out}
pip3 install --prefix "$EB_TMPDIR" easybuild &> "${pip_install_out}"

# keep track of original $PATH and $PYTHONPATH values, so we can restore them
ORIG_PATH=$PATH
ORIG_PYTHONPATH=$PYTHONPATH

echo ">> Final installation in ${EASYBUILD_INSTALLPATH}..."
export PATH=${EB_TMPDIR}/bin:$PATH
export PYTHONPATH=$(ls -d ${EB_TMPDIR}/lib/python*/site-packages):$PYTHONPATH
PYTHONPATH=$(ls -d "${EB_TMPDIR}"/lib/python*/site-packages):$PYTHONPATH
export PYTHONPATH
eb_install_out=${TMPDIR}/eb_install.out
ok_msg="Latest EasyBuild release installed, let's go!"
fail_msg="Installing latest EasyBuild release failed, that's not good... (output: ${eb_install_out})"
eb --install-latest-eb-release &> ${eb_install_out}
eb --install-latest-eb-release &> "${eb_install_out}"
check_exit_code $? "${ok_msg}" "${fail_msg}"

# restore origin $PATH and $PYTHONPATH values
Expand All @@ -173,11 +174,11 @@ else
if [[ $? -eq 0 ]]; then
ok_msg="EasyBuild v${REQ_EB_VERSION} installed, alright!"
fail_msg="Installing EasyBuild v${REQ_EB_VERSION}, yikes! (output: ${eb_install_out})"
eb EasyBuild-${REQ_EB_VERSION}.eb >> ${eb_install_out} 2>&1
eb EasyBuild-${REQ_EB_VERSION}.eb >> "${eb_install_out}" 2>&1
check_exit_code $? "${ok_msg}" "${fail_msg}"
fi

module avail easybuild/${REQ_EB_VERSION} &> ${ml_av_easybuild_out}
module avail easybuild/${REQ_EB_VERSION} &> "${ml_av_easybuild_out}"
if [[ $? -eq 0 ]]; then
echo_green ">> EasyBuild module installed!"
else
Expand All @@ -188,7 +189,8 @@ fi
echo ">> Loading EasyBuild module..."
module load EasyBuild/$REQ_EB_VERSION
eb_show_system_info_out=${TMPDIR}/eb_show_system_info.out
$EB --show-system-info > ${eb_show_system_info_out}
$EB --show-system-info > "${eb_show_system_info_out}"
# shellcheck disable=SC2181
if [[ $? -eq 0 ]]; then
echo_green ">> EasyBuild seems to be working!"
$EB --version | grep "${REQ_EB_VERSION}"
Expand All @@ -200,7 +202,7 @@ if [[ $? -eq 0 ]]; then
fi
$EB --show-config
else
cat ${eb_show_system_info_out}
cat "${eb_show_system_info_out}"
fatal_error "EasyBuild not working?!"
fi

Expand Down Expand Up @@ -241,6 +243,7 @@ if [[ $GENERIC -eq 1 ]]; then
else
openblas_include_easyblocks_from_pr=''
fi
# shellcheck disable=SC2086
$EB $openblas_include_easyblocks_from_pr OpenBLAS-0.3.9-GCC-9.3.0.eb --robot
check_exit_code $? "${ok_msg}" "${fail_msg}"

Expand Down Expand Up @@ -414,6 +417,7 @@ $EB CMake-3.20.1-GCCcore-10.3.0.eb --robot --include-easyblocks-from-pr 2248
$EB --from-pr 14584 Rust-1.52.1-GCCcore-10.3.0.eb --robot
# use OpenBLAS easyconfig from https://github.com/easybuilders/easybuild-easyconfigs/pull/15885
# which includes a patch to fix installation on POWER
# shellcheck disable=SC2086
$EB $openblas_include_easyblocks_from_pr --from-pr 15885 OpenBLAS-0.3.15-GCC-10.3.0.eb --robot
# ignore failing FlexiBLAS tests when building on POWER;
# some tests are failing due to a segmentation fault due to "invalid memory reference",
Expand All @@ -429,18 +433,64 @@ fi
$EB SciPy-bundle-2021.05-foss-2021a.eb --robot
check_exit_code $? "${ok_msg}" "${fail_msg}"

# CUDA support

cuda_version="11.3.1"

# Need recent version of EasyBuild
echo ">> Installing EasyBuild 4.7.0..."
ok_msg="EasyBuild v4.7.0 installed"
fail_msg="EasyBuild v4.7.0 failed to install"
$EB --from-pr 17065 --include-easyblocks-from-pr 2893 --try-amend=use_pip=1
check_exit_code $? "${ok_msg}" "${fail_msg}"

LMOD_IGNORE_CACHE=1 module swap EasyBuild/4.7.0
check_exit_code $? "Swapped to EasyBuild/4.7.0" "Couldn't swap to EasyBuild/4.7.0"

# install p7zip (to be able to unpack RPMs)
p7zip_ec="p7zip-17.04-GCCcore-10.3.0.eb"
echo ">> Installing $p7zip_ec..."
ok_msg="$p7zip_ec installed, off to a good (?) start!"
fail_msg="Failed to install $p7zip_ec, woopsie..."
$EB $p7zip_ec --robot
check_exit_code $? "${ok_msg}" "${fail_msg}"

# install CUDA (uses eb_hooks.py to only install runtime)
cuda_ec="CUDA-${cuda_version}.eb"
echo ">> Installing $cuda_ec..."
ok_msg="$cuda_ec installed, off to a good (?) start!"
fail_msg="Failed to install $cuda_ec, woopsie..."
$EB $cuda_ec --robot
check_exit_code $? "${ok_msg}" "${fail_msg}"

# Add the host_injections CUDA so we can actually build CUDA apps
# (which unbreaks the symlinks from the runtime installation)
echo ">> Re-installing CUDA $cuda_version under host_injections (to un-break symlinks in EESSI installation)..."
"${TOPDIR}"/gpu_support/cuda_utils/install_cuda_host_injections.sh ${cuda_version}
ok_msg="CUDA $cuda_version (re)installed under host_injections!"
fail_msg="Failed to install CUDA $cuda_version under host_injections, woopsie..."
check_exit_code $? "${ok_msg}" "${fail_msg}"

# install CUDA samples (requires EESSI support for CUDA)
cuda_samples_ec="CUDA-Samples-11.3-GCC-10.3.0-CUDA-11.3.1.eb"
echo ">> Installing $cuda_samples_ec..."
ok_msg="$cuda_samples_ec installed, off to a good (?) start!"
fail_msg="Failed to install $cuda_samples_ec, woopsie..."
$EB $cuda_samples_ec --robot --from-pr=16914
check_exit_code $? "${ok_msg}" "${fail_msg}"

### add packages here

echo ">> Creating/updating Lmod cache..."
export LMOD_RC="${EASYBUILD_INSTALLPATH}/.lmod/lmodrc.lua"
if [ ! -f $LMOD_RC ]; then
python3 $TOPDIR/create_lmodrc.py ${EASYBUILD_INSTALLPATH}
if [ ! -f "$LMOD_RC" ]; then
python3 "$TOPDIR"/create_lmodrc.py "${EASYBUILD_INSTALLPATH}"
check_exit_code $? "$LMOD_RC created" "Failed to create $LMOD_RC"
fi

$TOPDIR/update_lmod_cache.sh ${EPREFIX} ${EASYBUILD_INSTALLPATH}
"$TOPDIR"/update_lmod_cache.sh "${EPREFIX}" "${EASYBUILD_INSTALLPATH}"

$TOPDIR/check_missing_installations.sh
"$TOPDIR"/check_missing_installations.sh

echo ">> Cleaning up ${TMPDIR}..."
rm -r ${TMPDIR}
rm -r "${TMPDIR}"
Loading