Setting up Conda
One of the easiest ways to set up your environment that works across platforms is to use conda. Conda creates 'virtual environments' that don't break the rest of your system, and uses a comprehensive package manager. It has a dedicated bioconda channel that makes it easy to install software for biomedical research.
Installing conda
The recommended way is to install miniconda
which is a minimal environment that lets you install
the packages you want. To get it:
If you are on Mac OS X or linux, download the appropriate installer from the miniconda download page. (For Mac OS X, we recommend using the 'bash' rather than 'pkg' installer, and make sure to choose the intel or Apple silicon version depending on your machine.)
If you are on Windows, you want to download the linux 64-bit version from within your Ubuntu terminal. (This is
because we will install it into the Linux subsystem for Windows, rather onto Windows directly.) To do this, copy the
link to the installer, and use wget
to download it e.g.
% wget <paste your link here>
from the Ubuntu for Windows terminal.
Note. The installer is a bash (.sh
) file. On Mac OS X, there are also OS X package (.pkg
)
installers available - run this instead if you want to and skip to the next section.
Note
Because this is an installer downloaded from the internet, you should check it's the
real thing before installing it. Run sha256sum <miniconda filename>
(linux or Ubuntu for Windows) or shasum -a 256 <miniconda filename>
(Mac OS X) as described and compare the output to the SHA256 has in the output table. If it's different, don't install!
See this page for more information.
To install, start a terminal and change directory to the downloads folder:
- on Mac OS X:
$ cd Downloads
- on Windows:
$ cd /mnt/c/Users/<username>/Downloads
- on Linux: probably
$ cd Downloads
You can check what's there by running ls
. Now run the installer:
$ ./Miniconda3-latest-<platform>.sh
Note
You may need to make the file 'executable' first. Run
$ chmod u+x ./Miniconda3-latest-<platform>.sh
to do this.
You will be asked to accept the license and choose an install location. If in doubt, the defaults
install to a folder called miniconda3
in your home directory, which is fine. Say 'yes' when asked
if you want to initialise the installer.
Activating and deactivating conda
If you read the blurb this command outputs, you'll see it says it is activating the conda environment by default on startup. This means, when you start a new terminal, conda is managing your environment for you. You'll see this because in new terminals the command prompt will look something like this:
(base) <username>@<computer>:~$
Here 'base' is the name of your conda environment.
You can deactivate the environment (going back to normal) with the conda deactivate
command
$ conda deactivate
And you can reactivate it with - you guessed it!
$ conda activate
Note
This is a downside of using conda: you have to remember what environment you're in at any one time.
Using conda to install software
Conda makes installing stuff easy. But before getting started let's add two 'channels' that will be really useful for bioinformatics work. These are 'conda-forge' which is a 'community-led collection of recipes [...] for the conda package manager', and 'bioconda' which 'lets you install thousands of software packages related to biomedical research'. The bioconda page explains how to do this, namely, using these commands:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict
These commands set up a set of channels with priority going from the bottom to the top. So from now
on conda will look in conda-forge
first, then bioconda
, and finally the base defaults
channel
to find software.
Getting mamba
The first thing we'll want to get is a better (faster) version of conda
itself, called mamba
:
$ conda install mamba
The mamba
package lives in the conda-forge
channel. Type 'y' and press <enter> to install.
Making a more specific environment
Before installing anything else, it's a good idea to make a new environment to use, instead of using the conda
'base' environment as you have been above. In conda, conda create
is used to do this:
$ conda create --name gms
To use the new environment you have to 'activate' it:
$ conda activate gms
If you start a new session, remember you might have to type this before doing any work - otherwise you might find your installed commands can't be found.
(Remember at any time you can then deactivate conda again by typing conda deactivate
.)
Installing samtools
Now that we have an environment, let's try installing samtools
, which is a workhorse tool for handling
next-generation sequencing data. While you can download the source code and compile it yourself,
conda makes this easy. You'll want a fairly recent version, so let's get version at least 1.15
which is available from the bioconda channel:
$ mamba install 'samtools>=1.15'
If you look at the output you'll see that this is getting htslib
and samtools
from bioconda, but
also libdeflate
from conda-forge
. Go ahead and install. Running samtools now gives you some
output:
$ samtools
Program: samtools (Tools for alignments in the SAM format)
Version: 1.15.1
Usage: samtools <command> [options]
...
Aside: what even is an 'environment'?
UNIX figures out how to find programs and other things using so-called 'environment variables'. You
can see them all using the env
command:
$ env
All conda is really doing is changing environment variables to point to its own copies of files.
For example the HOME
environment variable points at your home folder:
$ echo ${HOME}
/users/<username> (or similar)
Let's go there now and see what's there:
$ cd ${HOME}
$ ls
If you've followed the above, you should see that conda
has created a directory called miniconda3
in there where it puts the things it installs. For example the executable programs go in miniconda3/bin
:
$ ls miniconda3/bin
If you look there you will see (among many other things) the samtools
executable - because we just installed it.
To make this work, when you activate conda it sets relevant environment variables to point into
this folder. In particular it adds this bin
directory to your PATH
environment variable, which the
terminal uses to know where to look for programs. Look:
$ echo ${PATH}
/users/<username>/miniconda3/bin: (other stuff here...)
So if you type samtools
, the first place the terminal looks is in that folder.
If you deactivate the conda environment, PATH
changes to remove that folder and samtools will no longer work:
$ conda deactivate
$ samtools
Command 'samtools' not found...
However samtools
is still there on your filesystem - as it happens, you can still run it by
specifying its full path:
$ ./miniconda3/bin/samtools
In other words conda isn't doing anything magical here: it's just managing your environment
variables for you. This is basically how 'environments' work: they are systems of environment
variables including PATH
that tell the UNIX shell where to look for things.