We have started building software modules using a new process, and you need to know about a few key changes that will affect you:
- The new modules will be first in the list, and will be used by default for ambiguous `module load` commands
- Most modules will work just like before, with some exceptions for things like python, R, and perl that each have their own libraries or packages
- R, python, and perl packages have their own module files now, and need explicitly loaded (e.g. `module load r-rmpi` or `module load py-numpy`)
- Module naming or capitalization may be different
- Modules are now organized in a hierarchy to present a cleaner list, and prevent conflicts. You may need to use `module spider` to find the right module, and get instructions on how to load it (other modules may need to be loaded first)
- If you want to use the old module, ask for it explicitly, e.g. `module load bamtools/2.4` instead of `module load bamtools`
For more detailed information, keep reading.
Why we changed from RISA to Spack:
As our RISA library grew to hundreds of software titles, continuing to build packages by hand and keep up with new software versions, new OS versions, etc. was becoming cumbersome, and changing underlying dependencies introduced fragility. Our new process utilizes a tool called Spack, that uses python scripts to create a flexible and reproducible process for installing software.
For most software, you will continue to use the modules just as you have been:
$ module load beast2
That's all! No compiling! No finding obscure dependencies! We've already done all that. All you have to do is load a module and start issuing commands.
For some programs with special dependencies (such as older or proprietary compilers, or MPI), you'll need to load one of those modules first to 'unlock' that part of the module tree.
If I try to load exabayes for example:
$ module load exabayes
Lmod has detected the following error: These module(s) exist but cannot be loaded as
Try: "module spider exabayes" to see how to load the module(s)
That message helpfully tells me exactly what do to, so I try the spider command:
$ module spider exabayes
You will need to load all module(s) on any one of the lines below before the "exabayes/1.5-q27tiva"
module is available to load.
ExaBayes is a software package for Bayesian tree inference. It is
particularly suitable for large-scale analyses on computer clusters.
Let's load those modules, and try again:
$ module load gcc/4.8.5-fmvasaj openmpi/3.0.0-e6twho6
$ module load exabayes
No error messages; that seemed to work. Let's check the list of currently loaded modules:
$ module list
Currently Loaded Modules:
1) gmp/6.1.2-je7dag2 7) zlib/1.2.11-vhzh5cf 13) rdma-core/13-g3xqgzm
2) mpfr/3.1.5-kh3nrg3 8) libxml2/2.9.4-7x5vwih 14) libuuid/1.0.3-n4dikzc
3) mpc/1.0.3-ecwsgat 9) hwloc/1.11.8-ptr2u6t 15) psm/2017-04-28-qxuq6tw
4) gcc/4.8.5-fmvasaj 10) numactl/2.0.11-cxikw3m 16) libfabric/1.5.3-ywb6yjf
5) libpciaccess/0.13.5-qxhgslw 11) opa-psm2/10.3-37-ltvvrgd 17) openmpi/3.0.0-e6twho6
6) xz/5.2.3-yxhjznh 12) libnl/3.3.0-dojc5dq 18) exabayes/1.5-q27tiva
Now we can run exabayes as usual.
For more information on the topic of module hierarchies, Lmod has a good explanation here: http://lmod.readthedocs.io/en/latest/080_hierarchy.html
Packages of packages:
For software that provides its own packages or libraries (such as perl, python, and R), under the RISA model, we installed those sub-packages or libraries into the application tree for the parent package, and those sub-packages or libraries were obscured to the users. As a user, you just had to try loading the python module, and then check to see what was provided. This also didn't leave any flexibility for having multiple versions of a sub-package installed under the same parent program (we couldn't have two versions of numpy under the python module for example).
Going forward, each of these sub-packages will be their own module. This will mean more `module load` statements for you, but less ambiguity about which sub-packages you're using, and more control and a more reproducible environment.
If your program depends on several independent sub-packages or libraries, each of these will need to be loaded independently. If you have a sub-package that depends on another however, those dependencies will automatically be loaded. Take py-pandas for example:
$ module load py-pandas
$ module list
Currently Loaded Modules:
1) bzip2/1.0.6-v5xhjvn 5) readline/7.0-pf63r2a
2) ncurses/6.0-4v63qrr 6) sqlite/3.21.0-ude2ads
3) zlib/1.2.11-lafowlw 7) python/2.7.14-i3qxhgc
4) openssl/1.0.2n-mplvsup 8) openblas/0.2.20-lyrpuwt
Loading the py-pandas module has loaded all of the dependencies for pandas. If my python program also needs py-pillow, I would need to load that as well.
Installing your own sub-packages and libraries:
Maybe you're working with R, and need some R packages that you don't see modules for. You have a couple options:
- Email email@example.com and request for the package to be installed
- Try to install it yourself in your home directory using CRAN, PIP, CPAN, etc.
Requesting an install from ResearchIT is always fine, but especially appropriate if there are a lot of dependencies for the package you need, or if you think it will be widely used by other users.
If you're in a hurry, or just want to quickly experiment with a package - you can try to install it on your own. Python, Perl, and R all have methods to allow you to define your own install directory that doesn't require administrative rights to write to.
Install python packages using pip:
#macs2 depends on numpy, so I'm also loading the py-numpy module
$ module load py-pip py-numpy
$ pip install --user macs2
Using cached MACS2-220.127.116.1160309.tar.gz
Requirement already satisfied: numpy>=1.6 in /opt/rit/spack-app/linux-rhel7-x86_64/gcc-7.2.0/py-numpy-1.13.3-ycd4tsbrrejnwlff7n6hek3ppqft27ns/lib/python2.7/site-packages (from macs2)
Installing collected packages: macs2
Running setup.py install for macs2 ... done
Successfully installed macs2-18.104.22.16860309
Install R packages using install.packages:
Installing package into ‘/home/yournetid/R/x86_64-pc-linux-gnu-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://mirror.las.iastate.edu/CRAN/src/contrib/RColorBrewer_1.1-2.tar.gz'
Content type 'application/x-gzip' length 11532 bytes (11 KB)
downloaded 11 KB
* installing *source* package ‘RColorBrewer’ ...
** package ‘RColorBrewer’ successfully unpacked and MD5 sums checked
** preparing package for lazy loading
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (RColorBrewer)
The downloaded source packages are in
In a script, you need to specify where to install the package since R can't ask you interactively, and use our local CRAN mirror:
install.packages("RColorBrewer", lib="/home/baber/Rlibs", repos="https://mirror.las.iastate.edu/CRAN")
Installing perl packages using cpanm:
$ module load perl
$ cpanm --local-lib /home/baber/perl-lib --mirror https://mirror.las.iastate.edu/CPAN AppleII::LibA2
--> Working on AppleII::LibA2
Fetching https://mirror.las.iastate.edu/CPAN/authors/id/C/CJ/CJM/AppleII-LibA2-0.... ... OK
==> Found dependencies: Module::Build
--> Working on Module::Build
Fetching https://mirror.las.iastate.edu/CPAN/authors/id/L/LE/LEONT/Module-Build-0... ... OK
Configuring Module-Build-0.4224 ... OK
Building and testing Module-Build-0.4224 ... OK
Successfully installed Module-Build-0.4224
Configuring AppleII-LibA2-0.201 ... OK
Building and testing AppleII-LibA2-0.201 ... OK
Successfully installed AppleII-LibA2-0.201
2 distributions installed
Reproducibility & self-managed installs:
Reproducibility with Spack package specs:
Beyond providing a more stable and easier to administer software environment, Spack can also help ensure the software environment used for your research is documented and reproducible.
When you issue a `module avail` command, you'll notice an extra 7 characters at the end of each of the module names:
------------------------- /opt/rit/spack-modules/lmod/linux-rhel7-x86_64/Core --------------------------
albert/4.0a_opt4-jqtjblk openmpi/3.0.0-3r57wrr (D)
The extra characters are a truncated sha1 hash of the package specification (spec) that takes into account all of the variances and dependencies used to install the package.
You can find the full expansion of these details in the package 'spec' files located in our GitHub account: https://github.com/ResearchIT/isu-spack. You can include these spec files in your lab notebook, manuscript, or supplementary information to document exactly what versions of software were used for your research. This file can also be used to redeploy the software on another server, or at another institution.
For more information or help with the use of spec files, please contact firstname.lastname@example.org
Self Managed Spack Installs:
Besides being the tool used to produce the modules for the Research Computing environment across the University, you can also use Spack in your own home directory if you want to install different versions or variants of packages. The process to install in your home directory and get started is straight forward, and Spack provides good introductory documentation.
git clone https://github.com/spack/spack.git
spack install hdf5
While you can get started quickly this way, it's also easy for your Spack directory to get large & messy quickly. Be sure to keep an eye on your disk space utilization, and uninstall packages that you're done using or testing if you start running low.