Available Software

Note: all this software is available, already compiled, by mounting RISA.

Title Description RISA Versions Condo Versions
abyss

ABySS is parallel, paired-end sequence assembler designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes. The new release has a new paired de Bruijn graph mode for assembly. The new version 1.9.0 introduces a new paired de Bruijn graph mode for assembly.

1.9.0 1.9.0
albert

Albert is an interactive program to assis the specialist in the study of nonassociative algebra.

4.0a 4.0a
allpaths-lg

ALLPATHS-LG is our original short read assembler and it works on both small and large (mammalian size) genomes. To use it, you should first generate ~100 base Illumina reads from two libraries: one from ~180 bp fragments, and one from ~3000 bp fragments, both at about 45x coverage. Sequence from longer fragments will enable longer-range continuity.

52488
amber

'Amber' refers to two things: a set of molecular mechanical force fields for the simulation of biomolecules (which are in the public domain, and are used in a variety of simulation programs); and a package of molecular simulation programs which includes source code and demos.

14 14
angsd

Angsd is a program for analysing NGS data.The software can handle a number of different input types from mapped reads to imputed genotype probabilities. Most methods take genotype uncertainty into account instead of basing the analysis on called genotypes. This is especially useful for low and medium depth data.

0.902 0.902
ascp

The ascp (Aspera secure copy) executable is a command-line FASP transfer program.

3.5.4.102989
atlas

The ATLAS (Automatically Tuned Linear Algebra Software) project is an ongoing research effort focusing on applying empirical techniques in order to provide portable performance. At present, it provides C and Fortran77 interfaces to a portably efficient BLAS implementation, as well as a few routines from LAPACK

3.10.2, 3.10.3 3.10.2
augustus

AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences

3.2.1 3.2.1
autodock_vina

AutoDock Vina is an open-source program for doing molecular docking. It was designed and implemented by Dr. Oleg Trott in the Molecular Graphics Lab at The Scripps Research Institute.

1.1.2
autotools 20160701 20160701
bam2wig

Conversion of a BAM alignment to wiggle and bigwig coverage files, with flexible reporting options. Bam2wig is a perl program and so it requires perl to compile.

1.2
bambam

Bambam is a tool used to facilitate NGS analysis. This utility depends on bamtools, samtools, bcftools, and lib/htslib; these modules are loaded for you when you run the command module load bambam/1.3.

1.3
bamtools

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

2.4.0 2.4.0
bbmap

BBMap: Short read aligner for DNA and RNA-seq data. Capable of handling arbitrarily large genomes with millions of scaffolds. Handles Illumina, PacBio, 454, and other reads; very high sensitivity and tolerant of errors and numerous large indels.

35.82 35.82
bcftools

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF.

1.2 1.2
beagle

Beagle performs genotype calling, genotype phasing, imputation of ungenotyped markers, and identity-by-descent segment detection.

2.1.2 2.1.2
beast

BEAST, Bayesian Evolutionary Analysis Sampling Trees, is a cross-platform program for Bayesian analysis of molecular sequences using MCMC. The program is orientated towards (strict and relaxed) molecular clock analyses. It can be used as a method of constructing phylogenies, but it is also intended for testing evolutionary hypotheses without conditioning on a single tree topology.

1.8.2, 2.3.0, 2.4.3 2.3.1, 2.4.2
bedtools2

Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.

2.24.0 2.25.0
bioawk

BWK awk modified for biological data

1.0 1..0
biopieces

The Biopieces are a collection of bioinformatics tools that can be pieced together in a very easy and flexible manner to perform both simple and complex tasks.

20151004
blast

The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences

2.2.21
blat

http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/blat/

2016.08.23 2016.08.23
boost

Boost is a set of libraries for the C++ programming language that provide support for tasks and structures such as linear algebra, pseudorandom number generation, multithreading, image processing, regular expressions, and unit testing. It contains over eighty individual libraries.

1.58.0 1.59.0, 1.60.0
bowtie

Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).

1.1.2, 2.2.6 1.1.2, 2.2.6
bucky

BUCKy is a free program to combine molecular data from multiple loci. BUCKy estimates the dominant history of sampled individuals, and how much of the genome supports each relationship, using Bayesian concordance analysis.

1.4.4
busco

Benchmarking Universal Single-Copy Orthologs. Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs

1.1, 1.1b1
butter

butter: Bowtie UTilizing iTerative placEment of Repetitive small rnas. A wrapper for bowtie to produce small RNA-seq alignments where multimapped small RNAs tend to be placed near regions of confidently high density.

0.3.3
bwa

BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp.

0.7.12 0.7.13
bzip2

Bzip2 is a freely available, patent free, high-quality data compressor.

1.0.6
cairo

Cairo is a 2D graphics library with support for multiple output devices.

1.14.6
cap3

CAP3 is DNA Sequence Assembly Program

20150210
ccp4

The CCP4 Suite: Programs for Protein Crystallography

7.0
cdbfasta

Fast indexing and retrieval of fasta records from flat file databases

2013-04-23 2013-04-23
cdd

The Conserved Domain Database is a resource for the annotation of functional units in proteins. Its collection of domain models includes a set curated by NCBI, which utilizes 3D structure to provide insights into sequence/structure/function relationships.

2016-01-28
chlorop

chlorop predicts the presence of chloroplast transit peptides in protein sequences, and the location of cTP cleavage site in each input sequence.

1.1
clustalw

Clustal W is a general purpose multiple alignment program for DNA or proteins.

2.1 2.1
cmake

CMake is an open-source, cross-platform family of tools designed to build, test and package software.

3.5.0-rc2
community

Louvain method for finding communities in large networks.

2016-01-15
corset

Corset is a command-line software program to go from a de novo transcriptome assembly to gene-level counts. Our software takes a set of reads that have been multi-mapped to the transcriptome (where multiple alignments per read were reported) and hierarchically clusters the transcripts based on the proportion of shared reads and expression patterns. It will report the clusters and gene-level counts for each sample, which are easily tested for differential expression with count based tools such as edgeR and DESeq.

1.04
csdp

CSDP is a library of routines that implements a predictor corrector variant of the semidefinite programming algorithm of Helmberg, Rendl, Vanderbei, and Wolkowicz

6.1.1
cufflinks

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols.

2.2.1 2.2.1
curl

Curl is a computer software project providing a library and command-line tool for transferring data using various protocols.

7.45.0 7.47.1
damageproto

Provides the protocol damage uses

1.2.1
dialign-tx

DIALIGN-T is a reimplementation of the multiple-alignment program DIALIGN. Due to several algorithmic improvements, it produces significantly better alignments on locally and globally related sequence sets than previous versions of DIALIGN. However, like the original implementation of the program, DIALIGN-T uses a a straight-forward greedy approach to assemble multiple alignments from local pairwise sequence similarities. Such greedy approaches may be vulnerable to spurious random similarities and can therefore lead to suboptimal results. In this paper, we present DIALIGN-TX, a substantial improvement of DIALIGN-T that combines our previous greedy algorithm with a progressive alignment approach.

1.0.2
discovar

DISCOVAR is a variant caller and small genome assembler. The heart of DISCOVAR is a de novo genome assembler, one that is accurate enough to produce assemblies that can be used for variant calling given a reference sequence. DISCOVAR can also generate de novo assemblies for small genomes, but consider using DISCOVAR de novo instead which can assemble genomes up to mammalian size.

52488
discovardenovo

DISCOVAR de novo is a large (and small) de novo genome assembler. It quickly generates highly accurate and complete assemblies using the same single library data as used by DISCOVAR. It currently doesn’t support variant calling – for that, please use DISCOVAR instead.

52488
dmd 2.068.1
eigen

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms

3.2.6
eman2

EMAN2 is the successor to EMAN1, and EMAN2.1 eliminated the unpopular BDB system from EMAN2, in favor of flat files. EMAN is a broadly based greyscale scientific image processing suite with a primary focus on processing data from transmission electron microscopes.

2.12
emboss

EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community.

6.5.7 6.5.7
erne

ERNE 2 (Extended Randomized Numerical alignEr) is a short string alignment package whose goal is to provide an all-inclusive set of tools to handle short (NGS-like) reads. ERNE 2 (a.k.a. bw-erne) uses the Burrows Wheeler Transformation (BWT) to reduce memory requirements preserving its speed and accuray. ERNE 2 comprises ERNE-MAP (core alignment tool/algorithm), ERNE-BS5 (bisulfite treated reads aligner), ERNE-FILTER (quality trimming and contamination filtering), and parallel version of the aligners (ERNE-PMAP and ERNE-PBS5).

2.1
exonerate

Exonerate is a generic tool for pairwise sequence comparison.

2.2.0 2.2.0
express

eXpress is a streaming tool for quantifying the abundances of a set of target sequences from sampled subsequences. Example applications include transcript-level RNA-Seq quantification, allele-specific/haplotype expression analysis (from RNA-Seq), transcription factor binding quantification in ChIP-Seq, and analysis of metagenomic data.

1.5.1 1.5.1
falcon

Falcon: a set of tools for fast aligning long reads for consensus and assembly.

1.7.5
fasta

A package of progams for similarity searching, searching with short fragments, and finding non-overlapping local alignments.

36.3.8
fastphase

Software for haplotype reconstruction, and estimating missing genotypes from population data

1.4.8
fastqc

FastQC is an application which takes a FastQ file and runs a series of tests on it to generate a comprehensive QC report. This will tell you if there is anything unusual about your sequence.

0.11.3 0.11.4
faststructure

fastStructure is a fast algorithm for inferring population structure from large SNP genotype data. It is based on a variational Bayesian framework for posterior inference and is written in Python2.x. Here, we summarize how to setup this software package, compile the C and Cython scripts and run the algorithm on a test simulated genotype dataset.

1.0
fastx-toolkit

The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.

0.0.14
fastx_toolkit

The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.

0.0.14
fftw

Fftw is a software library for computing discrete Fourier transforms (DFTs)

3.3.4 3.3.4_system
fixesproto

Provides the protocol Xfixes uses

5.0
fpc

fpc is the free pascal compiler

3.0.0 3.0.0
freebayes

FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.

1.0.2
freeglut 3.0.0
freetype

Freetype is a freely available software library to render fonts

2.6.3 2.6.3
gamess

GAMESS is a program for ab initio molecular quantum chemistry.

2014-12-05
gatk

The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyze high-throughput sequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

3.4-46, 3.5, 3.6 3.4-46, 3.5, 3.6
gaussian

Gaussian is a computer program for computational chemistry

g09
gblocks

Gblocks is a computer program written in ANSI C language that eliminates poorly aligned positions and divergent regions of an alignment of DNA or protein sequences

0.91b 0.91b
gcc

GCC is the GNU Compiler Collection. It provides a bunch of compilers.

6.1.0, 6.2.0 4.9.3, 6.1.0, 6.2.0
gd

GD is an open source code library for the dynamic creation of images by programmers. GD is written in C, and 'wrappers' are available for Perl, PHP and other languages. GD creates PNG, JPEG, GIF, WebP, XPM, BMP images, among other formats. GD is commonly used to generate charts, graphics, thumbnails, and most anything else, on the fly.

2.2.2 2.1.1
gdal

GDAL is a translator library for raster and vector geospatial data formats that is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation.

2.0.1 2.0.1
genemark

Gene Prediction in Bacteria, archaea, Metagenomes and Metatranscriptomes.

4.32
gengetopt 2.22.6 2.22.6
genometools

genometools is a free collection of bioinformatics tools (in the realm of genome informatics) combined into a single binary named gt.

1.5.9
geos

GEOS (Geometry Engine - Open Source) is a C++ port of the Java Topology Suite (JTS)

3.5.0
glew 1.13.0
glproto

This provides the gl headerfiles for mesa

1.4.14
gmp 6.1.1 6.0.0, 6.1.1
gnuplot

Gnuplot is a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX, VMS, and many other platforms. The source code is copyrighted but freely distributed (i.e., you don't have to pay for it).

5.0.4 5.0.4
gromacs

GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.

5.1.0, 5.1.1, 5.1.2 4.5.6-openmpi, 4.6.7, 4.6.7-openmpi, 5.0.5, 5.1.2
gsl

The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers. It is free software under the GNU General Public License.

1.16, 2.1 1.16
gsnap

GMAP: a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. GSNAP: computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained search process of merging and filtering position lists from a genomic index.

20160404, 20150723, 20151120, 20160816 20150723, 20151231, 20160816
gtextutils 0.7
guidance

Guidence: Accurate detection of unreliable alignment regious accounting for the uncertainty of multiple parameters.

2.01 2.01
hapcut

HapCUT is a max-cut based algorithm for haplotype assembly that uses the mix of sequenced fragments from the two chromosomes of an individual, this program can be applied to sequence data generated from next-generation sequencing platforms. HapCUT takes as input the aligned SAM/BAM files for an individual diploid genome and the list of variants, and outputs the phased haplotype blocks that can be assembled from the sequence reads.

0.7
hisat2

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome).

2.0.4 2.0.4
hmmer

HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).

3.1b2 3.1b2
hstlib 1.2.9
htslib 1.2.1, 1.3
ice 1.0.9
idba

IDBA is a practical iterative De Bruijn Graph De Novo Assembler for sequence assembly in bioinfomatics.

1.1.1 1.1.1
imagemagick

ImageMagick is a software suite to create, edit, compose, or convert bitmap images.

6.9.2-8 6.9.3.-0
infernal

Infernal (INFERence of RNA ALignment) is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.

1.1.1
inputproto

x proto stuff

2.3.1
interproscan

InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites

5.15-54.0
isl 0.16.1
jags

JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation.

4.2.0 4.2.0
java

Java is a programming language and computing platform first released by Sun Microsystems in 1995. It is a general-purpose computer programming language that is concurrent, class-based, and object-oriented. Java is fast, secure, and reliable.

1.7.0_55, 1.8.0_51 1.7.0_79, 1.8.0_60
jemalloc

jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support.

3.6.0
julia

julia with default GCC OPENBLAS_TARGET_ARCH=NEHALEM

0.3.11, 0.3.11_LAS_gcc
kaks-calculator

KaKs_Calculator adopts model selection and model averaging to calculate nonsynonymous (Ka) and synonymous (Ks) substitution rates, attempting to include as many features as needed for accurately capturing evolutionary information in protein-coding sequences. In addition, several existing methods for calculating Ka and Ks are also incorporated into KaKs_Calculator.

1.2
kallisto

kallisto is a program for quantifying abundances of transcripts from RNA-Seq data.

0.42.5 0.42.5
kbproto

This extension makes it possible to clearly and explicitly specify most aspects of keyboard behavior on a per-key basis. The X Keyboard Extension essentially replaces the core protocol definition of a keyboard.

1.0.7
kmergenie

KmerGenie estimates the best k-mer length for genome de novo assembly. Given a set of reads, KmerGenie first computes the k-mer abundance histogram for many values of k. Then, for each value of k, it predicts the number of distinct genomic k-mers in the dataset, and returns the k-mer length which maximizes this number. Experiments show that KmerGenie's choices lead to assemblies that are close to the best possible over all k-mer lengths.

1.6982
lammps

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator.

20160514-openmpi
lapack

Ostensibly this is included with ATLAS, but if you just want LAPACK here you go.

3.6.0
libtool

GNU libtool is a generic library support script. Libtool hides the complexity of using shared libraries behind a consistent, portable interface.

2.4.6
macs

MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, is publicly available open source, and can be used for ChIP-Seq with or without control samples.

2.1.0 2.1.0
mafft

Multiple alignment program for amino acid or nucleotide sequences.

7.245 7.294
magpie

Magpie is a "wrapper" for using the Apache spark software. A template submission script is located in /shared/hpc/magpie/magpie.sbatch-srun-spark.

Copy this script into your project's working directory:

[las_thoma15@condo2017 myworkdir]$ cp /shared/hpc/magpie/magpie.sbatch-srun-spark .

And then set your WORKDIR variable (necessary for making sure log files are written to the correct location):

[las_thoma15@condo2017 myworkdir]$ export WORKDIR=$PWD

Edit the magpie.sbatch-srun-spark file. Change the slurm variables (node count, project name, etc.) accordingly. Other values that may need to be changed include MAGPIE_JOB_TYPE and SPARK_MODE. Variables that have been commented out may be used if your job requires them.   

Finally, load the magpie module:

[las_thoma15@condo2017 myworkdir]$ module load magpie

Submit the slurm job:

[las_thoma15@condo2017 myworkdir]$ sbatch magpie.sbatch-srun-spark

 

 

1.81
maker-p

Sequencing diverse plant species of evolutionary, agricultural, and medicinal interest is becoming routine for even small groups - genome annotation and analysis is much less so. The MAKER-P pipeline is designed to make the annotation of novel plant genomes tractable for small groups with limited bioinformatics experience and resources, and faster and more transparent for large groups with more experience and resources.

2.31.8 2.31.8
maq

Maq is a software that builds mapping assemblies from short reads generated by the next-generation sequencing machines.

0.7.1
mariadb 10.1.9
mark

Program MARK, developed and maintained by Gary White (Colorado State University) is the most flexible, widely used application currently available for parameter estimation using data from marked individuals.

20160623
masurca

MaSuRCA is whole genome assembly software. It combines the efficiency of the de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches. MaSuRCA can assemble data sets containing only short reads from Illumina sequencing or a mixture of short reads and long reads (Sanger, 454).

3.1.0, 3.1.3
mathematica

Mathematica has defined the state of the art in technical computing—and provided the principal computation environment for millions of innovators, educators, students, and others around the world.

10.4
maven

Apache Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information.

3.3.3
mcl

The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs.

14-173
meme

Motif-based sequence analysis tools

4.11.1 4.11.1
meraculous

Meraculous is a whole genome assembler for Next Generation Sequencing data geared for large genomes.

2.0.5, 2.2.2
mesa

Mesa is an opengl implementation

11.0.7
mesa-glu

This library provides support for the png image format

9.0.0
migrate

Migrate estimates effective population sizes and past migration rates between n population assuming a migration matrix model with asymmetric migration rates and different subpopulation sizes

3.6.11, 3.6.11-openmpi
migrate-n

Migrate estimates population parameters, effective population sizes and migration rates of n populations, using genetic data. It uses a coalescent theory approach taking into account history of mutations and uncertainty of the genealogy.

3.6.11
mira

MIRA is a multi-pass DNA sequence data assembler/mapper for whole genome and EST/RNASeq projects.

4.0.2
mirdeep

miRDeep2 is a completely overhauled tool which discovers microRNA genes by analyzing sequenced RNAs. The tool reports known and hundreds of novel microRNAs with high accuracy in seven species representing the major animal clades.

2.0.0.7
mirdp

MiRDeep-P has two main purposes. First, miRDeep-P can be used to identify miRNA genes in plant species, even for those without detailed annotation. Second, miRDeep-P is designed to assign expression status to individual miRNA genes, which is critical as more miRNAs in plants belong to paralogous families with multiple members encoding identical or near-identical miRNAs

1.3
mlhka

MLHKA- A maximum likelihood ratio test of natural selection, using polymorphism and divergence data.

2
mpc 1.0.3 1.0.3
mpfr 3.1.4 2.4.2, 3.1.4
mpiblast

ResearchIT has provided NCBI files at /ptmp/LAS/BioDatabase/mpiBLASTdb/NCBI. More info here: https://researchit.las.iastate.edu/bio-databases. mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST. By efficiently utilizing distributed computational resources through database fragmentation, query segmentation, intelligent scheduling, and parallel I/O, mpiBLAST improves NCBI BLAST performance by several orders of magnitude while scaling to hundreds of processors.

1.6.0
mrbayes

MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.

3.2.5 3.2.5, 3.2.5_beagle
msmc

This software implements MSMC, a method to infer population size and gene flow from multiple genome sequences (Schiffels and Durbin, 2014, Nature Genetics). In short, msmc can infer the scaled population size of a single population as a function of time and the timing and nature of population separations between two populations from multiple phased haplotypes. When only two haplotypes are given, MSMC is similar to PSMC, and we call it PSMC' because of subtle differences in the method and the underlying model, which allows PSMC' to infer more accurately the recombination rate.

20150506 20140814
muscle

MUSCLE stands for MUltiple Sequence Comparison by Log-Expectation. MUSCLE is claimed to achieve both better average accuracy and better speed than ClustalW2 or T-Coffee, depending on the chosen options.

3.8.1551 3.8.31
namd

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems.

2015-11-17, 2.11, 2.11-ibverbs
ncbi-blast

he Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

2.2.31+, 2.4.0+ 2.2.21, 2.4.0+
ncbi-rmblastn

RMBlast is a RepeatMasker compatible version of the standard NCBI BLAST suite. The primary difference between this distribution and the NCBI distribution is the addition of a new program 'rmblastn' for use with RepeatMasker and RepeatModeler.

2.2.28 2.2.28
ncl

The NCAR Command Language (NCL), a product of the Computational & Information Systems Laboratory at the National Center for Atmospheric Research (NCAR) and sponsored by the National Science Foundation, is a free interpreted language designed specifically for scientific data processing and visualization.

6.3.0
netcdf

NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

4.3.3.1 4.3.3.1
ngs-sdk

NGS is a new, domain-specific API for accessing reads, alignments and pileups produced from Next Generation Sequencing. The API itself is independent from any particular back-end implementation, and supports use of multiple back-ends simultaneously.

1.2.2
openblas

A BLAS implementation

0.2.19
openmpi

The Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available.

1.10.0, 2.0.1 1.6.5, 1.8.6, 1.10.3, 2.0.1
orthomcl

OrthoMCL is an algorithm and a set of tools that can be used for identification of orthologous genes within a set of genomes. To use orthomcl, first you need to run the setup file.

2.0.9
pagan

PAGAN is a general-purpose method for the alignment of sequence graphs.

20160711 20150723
pagit

With the advent of next generation sequencing a lot of effort was put into developing software for mapping or aligning short reads and performing genome assembly. For genome assembly the problem of generating a draft assembly (i.e. a set of unordered contigs) has now been very well addressed - but for users who need high quality assemblies for their analyses there are still unresolved issues: this is where PAGIT is used.

1.64
paml

PAML is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood.

4.8
pandaseq

PANDASEQ is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence.

2.8
parallel

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables.

20160422 20120122, 20150822, 20150922
parsimonator

Parsimonator is a no-frills light-weight implementation for building starting trees under parsimony for RAxML-Light.

1.0.2
pbstools

Several utilities that have been developed at OSC, NICS, and elsewhere to aid in the administration and management of PBS variants (including OpenPBS, PBS Pro, and TORQUE).

3.1
pcma

PCMA is a progressive multiple sequence alignment program that combines two different alignment strategies. The example files are located in ${progdir}/examples

2004
perl

Software is compiled with threads enabled. Otherwise, this is a standard perl install.

5.22.1 5.22.0, 5.22.0_gcc, 5.22.2, 5.24.0
phast

PHAST is a freely available software package for comparative and evolutionary genomics.

1.3
phrap

phrap is a program for assembling shotgun DNA sequence data

1.090518
phylip

PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies (evolutionary trees). It is available free over the Internet, and written to work on as many different kinds of computer systems as possible. The source code is distributed (in C), and executables are also distributed. In particular, already-compiled executables are available for Windows (95/98/NT/2000/me/xp/Vista), Mac OS X, and Linux systems. Older executables are also available for Mac OS 8 or 9 systems. Complete documentation is available on documentation files that come with the package.

3.696
picard

It is a set of tools in Java command line for manipulating high-throughput sequencing data (HTS) data and formats.​ Picard is implemented using the HTSJDK Java library HTSJDK, supporting accessing of common file formats, such as SAM and VCF, used for high-throughput sequencing data​

1.137, 2.2.4 2.2.4
pixman

Pixman is a low-level software library for pixel manipulation.

0.34.0
plink

PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

1.07 1.07
pmap 20101125
png

This library provides support for the png image format

1.6.21 1.6.21, 1.6.25
poa

POA is Partial Order Alignment, a fast program for multiple sequence alignment in bioinformatics. Its advantages are speed, scalability, sensitivity, and the superior ability to handle branching / indels in the alignment.

2
postgresql

Postgres is an sql implementation

9.5.3
prank

PRANK: Probabilistic Alignment Kit.

150803 150803
price

PRICE (Paired-Read Iterative Contig Extension): a de novo genome assembler implemented in C++. Its name describes the strategy that it implements for genome assembly: PRICE uses paired-read information to iteratively increase the size of existing contigs.

140408
probcons

PROBCONS is a novel tool for generating multiple alignments of protein sequences. Using a combination of probabilistic modeling and consistency-based alignment techniques, PROBCONS has achieved the highest accuracies of all alignment methods to date. On the BAliBASE benchmark alignment database, alignments produced by PROBCONS show statistically significant improvement over current programs, containing an average of 7% more correctly aligned columns than those of T-Coffee, 11% more correctly aligned columns than those of CLUSTAL W, and 14% more correctly aligned columns than those of DIALIGN.

1.12, 1.12-RNA
proj

Program proj (release 3) is a standard Unix filter function which converts geographic longitude and latitude coordinates into cartesian coordinates

4.9.1, 4.9.2 4.9
pthread-stubs

Provides some macros to simplify pthreads X

0.3
pymol

PyMOL is an OpenGL based molecular visualization system

1.8.0.0
pysam

Pysam is a python module for reading and manipulating files in the SAM/BAM format. The SAM/BAM format is a way to store efficiently large numbers of alignments (Li 2009), such as those routinely created by next-generation sequencing methods.

1
python

Software is compiled with threads enabled. Otherwise, this is a standard Python 3 install. The executable is named python3 or python3.4.

3.4.3, 3.4.3, 2.7.10, 2.7.12, 2.7.12 2.7.10_gcc, 2.7.10, 2.7.10, 2.7.12
python_generic 2.7.11
r

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. Among other things it has an effective data handling and storage facility, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for data analysis, graphical facilities for data analysis and display either directly at the computer or on hardcopy, and a well developed, simple and effective programming language (called ‘S’) which includes conditionals, loops, user defined recursive functions and input and output facilities.

3.2.3, 3.3.1 3.2.0, 3.2.3
raxml

RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees. It can also be used for post-analysis of sets of phylogenetic trees, analysis of alignments and, evolutionary placement of short reads.

8.2.2 8.2.0, 8.2.9
ray

Ray is a parallel software that computes de novo genome assemblies with next-generation sequencing data.

2.3.1
rdptools

An open source command-line tool suite for performing a complete workflow of analysis tasks of NGS data. Includes the core modules from the RDP (Classifier, Clustering, SequenceMatch, ProbeMatch, InitialProcessing, FrameBot, ReadSeq) and all their dependencies.

2.0.2
repeatexplorer

RepeatExplorer is a computational pipeline for discovery and characterization of repetitive sequences in eukaryotic genomes.

2015-12-30
repeatmasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked.

4.0.5 4.0.6
revbayes

Bayesian phylogenetic inference using probabilistic graphical models and an interpreted language.

1.0.0-beta0, 1.0.0-beta2
rsem

RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation.

1.2.22, 1.2.31 1.2.26
ruby

A dynamic, open source programming language with a focus on simplicity and productivity.

2.3.0 2.3.0
sage 7.2
samtools

Samtools is a set of utilities that manipulate alignments in the BAM format. It imports from and exports to the SAM (Sequence Alignment/Map) format, does sorting, merging and indexing, and allows to retrieve reads in any regions swiftly.

0.1.19, 1.2, 1.3, 1.3.1 1.2, 1.3.1
shortstack

ShortStack is a tool developed to process and analyze smallRNA-seq data with respect to a reference genome, and output a comprehensive and informative annotation of all discovered small RNA genes. ShortStack discovers small RNA 'clusters' de novo, based on user-set thresholds, and annotates clusters with respect to small RNA size, orientation, and repetitiveness. ShortStack also discovers and annotates MIRNA genes. In addition, ShortStack includes a robust method to detect genes producing small RNAs in a phased manner. It outputs a descriptive table of all results, useful genome browser tracks, and detailed text-based alignments of all MIRNAs. It can also be used to quantify a set of input loci with genomic coordinates determined a priori by the user.

3.0, 3.4
sickle

Sickle is a tool that uses sliding windows along with quality and length thresholds to determine when quality is sufficiently low to trim the 3'-end of reads and also determines when the quality is sufficiently high enough to trim the 5'-end of reads.

1.33 1.33
signalp

SignalP predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive bacteria, Gram-negative bacteria, and eukaryotes.

4.1
sm 1.2.2
snap

SNAP is a fast and accurate aligner for short DNA reads. It is optimized for modern read lengths of 100 bases or higher, and takes advantage of these reads to align data quickly through a hash-based indexing scheme.

0.15.4, 1.0beta.18
snap-korf

(Semi-HMM-based Nucleic Acid Parser) gene prediction tool - latest release 11/29/2013.

2013-11-29
snpeff

SnpEff: Genetic variant annotation and effect prediction toolbox. snpEff.jar, SnpSift.jar, and the default snpEff.config are in \$SNPEFF_HOME.

4.2
soap-indel

SOAPindel is focusing on calling indels from the next-generation paired-end sequencing data.

2.1
soap2

SOAPaligner/soap2 is a member of the SOAP (Short Oligonucleotide Analysis Package). It is an updated version of SOAP software for short oligonucleotide alignment. The new program features in super fast and accurate alignment for huge amounts of short reads generated by Illumina/Solexa Genome Analyzer.

2.21
soapdenovo-trans

SOAPdenovo-Trans is a de novo transcriptome assembler basing on the SOAPdenovo framework, adapt to alternative splicing and different expression level among transcripts.The assembler provides a more accurate, complete and faster way to construct the full-length transcript sets.

1.04
soapdenovo2

SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads. It creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost effective way. Now the new version is available. SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome.

r240 r240
soapsnp

SOAPsnp is a member of the SOAP (Short Oligonucleotide Analysis Package). Despite its name, the program is a resequencing utility that can assemble consensus sequence for the genome of a newly sequenced individual based on the alignment of the raw sequencing reads on the known reference.

1.03
spades

SPAdes - St. Petersburg genome assembler - is intended for both standard isolates and single-cell MDA bacteria assemblies.

3.7.1 3.6.0
spark

Apache Spark is an open source cluster computing framework.

1.5.2
sparsehash

The sparsehash library contains several hash-map implementations, similar in API to SGI's hash_map class, but with different performance characteristics.

2.0.3 2.0.3
spidey

Spidey is an mRNA-to-genomic alignment program.

12.0.0
sqlite

SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.

3.8.11.1, 3.14.1 3.9.1, 3.14.1
sratoolkit

RA (Sequence Read Archive) is an NCBI-defined interchange format for NGS data. The idea is that before submitting your data to NCBI, you convert whatever format it is in (fastq, bam, etc.) to SRA format using one of the 'load' tools. Then, the data can be downloaded from NCBI by anyone and extracted in one of a number of different formats as desired (ABI csfasta/qual, fastq).

2.5.2 2.5.4-1
stringtie

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus.

1.2.4 1.2.4
structure

The program structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed. It can be applied to most of the commonly-used genetic markers, including SNPS, microsatellites, RFLPs and AFLPs. Example parameter files can be found in $progdir

2.3.4
tabix

Generic indexer for TAB-delimited genome position files

0.2.6 0.2.6
targetp

Subcellular location of proteins: mitochondrial, chloroplastic, secretory pathway, or other

1.1
tassel

TASSEL is a software package to evaluate traits associations, evolutionary patterns, and linkage disequilibrium.

5.2.14 5.0
tcl

Tcl (Tool Command Language) is a very powerful but easy to learn dynamic programming language

8.6.4, 8.6.6 8.6.4, 8.6.6
tcoffee

T-Coffee is a multiple sequence alignment program. The main characteristic of T-Coffee is that it will allow you to combine results obtained with several alignment methods. By default, T-Coffee will compare all you sequences two by two, producing a global alignment and a series of local alignments (using lalign). The program will then combine all these alignments into a multiple alignment.

11.00.8cbe486
texinfo

Texinfo is the official documentation format of the GNU project. Texinfo uses a single source file to produce output in a number of formats, both online and printed (dvi, html, info, pdf, xml, etc.).

6.0
texlive 2015
tiff

This library provides support for the tiff image format

4.0.6 3.8.2, 4.0.6
tk

Tcl (Tool Command Language) is a very powerful but easy to learn dynamic programming language

8.6.4 8.6.4, 8.6.6
tmalign

TM-align is an algorithm for sequence-order independent protein structure comparisons.

20140601
tmhmm

Transmembrane helices in proteins

2.0
tophat

TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

2.1.0, 2.1.1 2.0.14, 2.1.0, 2.1.1
tppred

tppred2 is a software package for the prediction of mitochondiral targeting peptides from protein primary sequence

2.0
transdecoder

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

2.0.1
transposome

A toolkit for annotation of transposable element families from unassembled sequence reads

0.09.7 0.09.7
trf

A tandem repeat in DNA is two or more adjacent, approximate copies of a pattern of nucleotides. Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences.

4.07b
trimmomatic

Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.​

0.33 0.33
trinityrnaseq

Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes.

2.0.6, 2.1.1 2.1.1
trnascan-se

tRNAscan-SE, snoscan and snoGPS for the detection of tRNAs, methylation-guide snoRNAs and pseudouridylation-guide snoRNAs, respectively. tRNAscan-SE is routinely applied to completed genomes, resulting in the identification of thousands of tRNA genes.

1.23, 1.3.1
varscan

Variant calling and somatic mutation/CNV detection for next-generation sequencing data

2.4.0
vcftools

VCFtools is a program package designed for working with VCF files.

0.1.14 0.1.14
velvet

Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.

1.2.10 1.2.10
viennarna

The ViennaRNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.

2.1.9, 2.2.10
vmatch

Vmatch is a versatile software tool for efficiently solving large scale sequence matching tasks

2.2.5 2.2.5
wigtobigwig

Convert ascii format wig file (in fixedStep, variableStep or bedGraph format) to binary big wig format

4
x11

This is xlib

1.6.3
xau

This library provides an implementation for the X11 authorization protocol.

1.0.8
xcb

This library provides an interface to the X windows system protocl.

1.11.1
xcb-proto

Provides the XML-XCB protocol that libxcb uses

1.11
xdamage

This is Xdamage

1.1.4
xext

Extensions to xlib

1.3.3
xextproto

X11 various extension wire protocol

7.3.0
xfixes

The X-fixes exntensions

5.0.1
xi 1.7.6
xmipp

Xmipp is a suite of image processing programs, primarily aimed at single-particle 3D electron microscopy

3.1
xmu

libXmu and libXmuu are a collection of miscellaneous utilities used by the X.Org sample clients. libXmuu consists of utilities that build upon just the core libX11 libraries. libXmu depends on the Athena Widgets toolkit, pulling in libXaw and libXt.

1.1.2
xorg-macros

This package adds some autoconfig macros for xorg

1.19.0
xplor-nih

XPLOR-NIH is a structure determination program which builds on the X-PLOR program, including additional tools developed at the NIH.

2.39
xpm

XPM format pixmap library

3.5.11
xproto

Provides headers for X protocol

7.0.28
xt

LibXt provides the X Toolkit Intrinsics, an abstract widget library upon which other toolkits are based. Xt is the basis for many toolkits

1.1.5
xtrans

x lib stuff

1.3.5
zlib

The zlib compression library provides in-memory compression and decompression functions, including integrity checks of the uncompressed data.

1.28 1.2.8, 1.2.9