Most of this is a one-time operation, done by the person setting up HeLmod. In order to join an existing HeLmod environment, this section is all that's needed.
The HeLmod system is designed to work on a CentOS 6 cluster. If you're setting this up for an organization other than Harvard FAS Research Computing (@fasrc on github), create a new canonical HeLmod master remote and adjust urls below accordingly.
As root, pick the central location on network storage for all cluster software, and clone the HeLmod repo there:
git clone [email protected]:/fasrc/helmod.git
cd helmod
Aside from initial configuration, this clone only needs to pull updates, so changing to an https remote later is fine.
This top-level directory will contain everything relevant to the HeLmod software management system, but actual rpm files, app installations, and other build outputs are .gitignore'd. Thus, make sure this is backed up regularly.
Edit the configuration:
$EDITOR setup.sh
and set FASRCSW_PROD
to the absolute path of this HeLmod clone you just made.
Push these changes back to the remote.
Source the setup.sh:
source ./setup.sh
This makes available some scripts such as fasrcsw-rpm
and fasrcsw-rpmbuild-*
which are very thin wrappers around the normal programs and just add some default options.
Use the rpm one to initialize the rpm database used exclusively for HeLmod:
fasrcsw-rpm --initdb
This repo clone is the one and only $FASRCSW_PROD
.
Each app contributor should clone the HeLmod repo in some personal location. The scripts in the repo need to be able to be read by root, so the location should not be root squashed (or, if root squashed, must be world-readable). Accumulation of sources and build output can easily reach many GBs, so make sure there is room to grow, too.
git clone [email protected]:/fasrc/helmod.git
cd helmod
These clones will need to regularly push updates back to the remote.
Customize setup.sh
if necessary.
In particular, make sure FASRCSW_PROD
points to the location of the production repo above.
For better performance, you might want to bind the rpmbuild BUILD directory to local scratch storage.
Assuming someone else has done all the initial setup, you can proceed to the HOWTO.
These repo clones are know as $FASRCSW_DEV
(one for each contributor).
The HeLmod system uses lmod.
FASRC uses the version posted on sourceforge; in particular, these instructions are matched to lmod 5.4.1.
Lmod installation will write files outside of the given --prefix
, specifically to the root-owned location /usr/share/zsh/site-functions
.
Rather than running make install
as root, the following temporarily changes file attributes to allow your regular account's group to write there.
The same approach is taken to just let it write to places within $FASRCSW_PROD
.
Thus, this writes to production locations!
If you're upgrading, the lmod make install
will automatically update to the lmod -> X.Y.Z
symbolic link!
(A better procedure for a more controlled upgraded needs to be implemented.)
lmod requires lua
5.1 or 5.2, plus lua-filesystem
, lua-posix
, and lua-devel
(at least as of lmod version 5.2).
Source a configured HeLmod clone, or at least define FASRCSW_PROD
to point to the parent of the production apps
directory.
Temporarily allow the writes that lmod wants to do (assuming your primary group is a trusted admin group):
sudo chgrp $(id -gn) /usr/share/zsh/site-functions
sudo chmod g+w /usr/share/zsh/site-functions
for f in _ml _module; do
if [ -f /usr/share/zsh/site-functions/"$f" ]; then
sudo chgrp $(id -gn) /usr/share/zsh/site-functions/"$f"
sudo chmod g+w /usr/share/zsh/site-functions/"$f"
fi
done
Also, temporarily allow writes to the production apps space:
sudo chgrp $(id -gn) "$FASRCSW_PROD"/apps
sudo chmod g+w "$FASRCSW_PROD"/apps
if [ -d "$FASRCSW_PROD"/apps/lmod/ ]; then
sudo chgrp $(id -gn) "$FASRCSW_PROD"/apps/lmod
sudo chmod g+w "$FASRCSW_PROD"/apps/lmod
fi
Stop your deployment system from replacing these files before their updates are captured:
/usr/share/zsh/site-functions/_ml
/usr/share/zsh/site-functions/_module
Get the source code (stash a copy in $FASRCSW_PROD/rpmbuild/SOURCES/
for good measure), configure it to use the various locations within HeLmod, and build it.
./configure --prefix="$FASRCSW_PROD"/apps --with-module-root-path="$FASRCSW_PROD"/modulefiles --with-spiderCacheDir="$FASRCSW_PROD"/moduledata/cacheDir --with-updateSystemFn="$FASRCSW_PROD"/moduledata/system.txt
make pre-install
make install
By whatever means your cluster uses (e.g. puppet), distribute lmod's configuration files to all cluster nodes:
FROM lmod BUILD ON ALL HOSTS
--------------------------------------------------------------------
/usr/share/zsh/site-functions/_ml (same thing)
/usr/share/zsh/site-functions/_module (same thing)
"$FASRCSW_PROD"/apps/lmod/lmod/init/profile /etc/profile.d/lmod.sh
"$FASRCSW_PROD"/apps/lmod/lmod/init/cshrc /etc/profile.d/lmod.csh
We also comment out the following lines in lmod.sh
and lmod.csh
respectively (where ...
is $FASRCSW_PROD
), to un-clutter the module namespace:
#export MODULEPATH=$(.../apps/lmod/lmod/libexec/addto --append MODULEPATH .../apps/lmod/lmod/modulefiles/Core)
#setenv MODULEPATH `.../apps/lmod/lmod/libexec/addto --append MODULEPATH .../apps/lmod/lmod/modulefiles/Core`
And we remove MODULEPATH $MODULEPATH_ROOT/$LMOD_sys
from the line above that, too.
Set the zsh directory back to the way it was:
sudo chgrp root /usr/share/zsh/site-functions
sudo chmod g-w /usr/share/zsh/site-functions
for f in _ml _module; do
if [ -f /usr/share/zsh/site-functions/"$f" ]; then
sudo chgrp root /usr/share/zsh/site-functions/"$f"
sudo chmod g-w /usr/share/zsh/site-functions/"$f"
fi
done
and likewise for the lmod installation location:
sudo chgrp root "$FASRCSW_PROD"/apps
sudo chmod g-w "$FASRCSW_PROD"/apps
if [ -d "$FASRCSW_PROD"/apps/lmod/ ]; then
sudo chgrp root "$FASRCSW_PROD"/apps/lmod
sudo chmod g-w "$FASRCSW_PROD"/apps/lmod
fi
You may also want to change the ownership of the newly installed lmod version directory and symlink to root, e.g. with something like:
VERSION=5.4.1
sudo chown -R root:root "$FASRCSW_PROD"/apps/{lmod,$VERSION}
lmod features optional caching of the modulefile hierarchy to make spider and avail faster; HeLmod enables this. Updates of the cache happen during the HeLmod app build process, therefore no cron job or other automatic update mechanism is necessary.
HeLmod includes a hook for logging module load events to a MySQL database.
To use this, create a MySQL database using $FASRCSW_PROD/modulehook/modulestats.sql, and create a file $FASRCSW_PROD/modulehook/.my.cnf.modulelogger
with the host, database name, and credentials for connecting to it.
Finally, tell lmod to use this hook by adding the following to lmod.sh
and lmod.csh
, respectively, filling in the value of FASRCSW_PROD:
export FASRCSW_PROD=...FIXME...
export LMOD_PACKAGE_PATH="$FASRCSW_PROD"/modulehook
setenv FASRCSW_PROD ...FIXME...
setenv LMOD_PACKAGE_PATH "$FASRCSW_PROD"/modulehook
This is the internal lmod/lua logging, which is after resolution of default versions and includes all modules loaded as dependencies.
To go to the extreme and also log what users are asking for on the command line, you can run this patch for init/bash
and init/csh
(it assumes FASRCSW_PROD
is set in the base lmod.sh
and lmod.csh
setup scripts as done above):
sudo patch --directory="$FASRCSW_PROD"/apps/lmod/lmod -p1 < "$FASRCSW_PROD"/misc/modulelogger.patch
(The patch was made against lmod 5.2 but still works in 5.4.1 since there are only minor offsets.)
You'll have to build the standard compiler and MPI apps against which all the other apps are built.
Spec files are provided in rpmbuild/SPECS
, but many of these require hacks that are not yet documented.
Once you've settled upon the sets of compiler and MPI implementations against which to build software by default, set the FASRCSW_COMPS
and FASRCSW_MPIS
arrays in setup.sh
and push these back to the master remote.