UCL Cluster

A cluster is a collection of computers that are used for a wide range of purposes, including scientific research and data analysis. Using many computers at once allows users to handle big amounts of data and run processes that require a lot of computing power and that otherwise would not be possible to run on a single computer. Importantly, many cluster have scheduling systems that can manage and allocate computing resources efficiently among users and applications.

If you are going to do NGS data analysis starting from FastQ files, you will most likely need to use a cluster to do the preprocessing (e.g., alignment and BAM processing). Also, a lot of the software needed for the analysis is already installed in the cluster, so you will not need to take care of installing it yourself.

Here there is a brief overview of the typical process of running a job in the cluster:

Please note that learning how to work in a cluster and use the command line is general across most types of Linux computers, but learning how to run software in the UCL cluster is quite specific to the peculiarities of the UCL cluster operating system, and that's why it's important that you go through the UCL cluster docuementation below.

There are several cluster machines available at UCL, but the most general one is Myriad. Everyone at UCL can request access to Myriad and use it to run data analyses, and in this section you will start by learining about Myriad.

TO DO

The first thing you need to do is to apply for an account on Myriad. Do so by filling the form in the UCL Account services webpage. Once you submit the form it might take a day or two to get access. You will receive a confirmation email when that happens.

In the meantime, take a good read of the Guide to New Users, it's a brief overview of how to connect to Myriad, transfer data, and submit jobs to Myriad's queuing system. You will learn a few Unix commands for this, specifically ssh, scp, rsync, qsub, qstat, etc... But do not worry if you feel it's too much to digest, a more comprehensive look at these and other commands will be the focus of the next section.

Myriad uses a module system to manage software. ​It means that a lot of the software that you might need to run your analyses has already been installed, and can be loaded and unloaded as needed. This is extremely helpful as ensures that users can access the specific tools and versions they need for their research without having to install them themselves.

A brief introduction on the general use of Myriad modules is here , and a full list of modules currently installed is here. There you will find software typically used for NGS data analysis like fastQC, trim_galore, bwa, and many others. And if there is any software that you need that is not available, you can always request rc-support to install a module for it (see here)