Skip to main content

Visualise RNA-seq data in R

Author: Kiki Cano-Gamez.

Welcome! This tutorial will show you some of the elementary skills you need to begin visualising RNA-seq data (specifically, abundance estimates for gene RNA transcripts) in R. To do so, we will use a publicly available data set generated by the Genotype-Tissue Expression (GTEx) consortium.

Note

If you prefer, there is also an R markdown version of this tutorial at this link (suitable for use in RStudio).

The GTEx RNA-seq dataset

The GTEx consortium is an ongoing project which has collected tissue material from over 1,000 individuals. This material has been used to generate RNA-sequencing data and study patterns of gene activity across different organs in the human body. For more detailed information on how the GTEx data was generated, you can refer to their online portal.

For the purposes of this tutorial, you will be working with a previously processed and cleaned file containing gene expression measurements for different tissues. The measurements in this file represent an average profile, generated by computing the mean expression of each gene across all individuals in the cohort. Furthermore, the course organisers cleaned this data set by removing any unannotated genes or any genes which were too lowly expressed to be of interest for this tutorial. Measurements were normalised to account for systematic differences between batches, and expression values are presented as Transcripts per Million (TPM). For more information on why normalisation is needed and how it is performed, you can refer to the accompanying slides.

To get started, go here.