Skip to main content

Conclusions and challenges

If you've made it through this tutorial and the earlier part - congratulations! I hope it was useful to you.

If you haven't done much programming or data analysis work before then this might have been a bit of a challenge. We've covered a number of topics -

And we hope you've picked up some good habits along the way, which we could summarise as:

  • test, test, test!
  • visualise, visualise, visualise!
  • Always sanity check results!

Challenges

Question

In the introduction to R we saw a curious fact. In humans, the GC content of human chromosomes appears to be inversely related to chromosome length.

Question Could this be due to gene density? That is - maybe shorter chromosomes have higher densities of genes - and genes are known to be associated with higher GC content.

Challenge 1 If you haven't already done so, quantify this relationship by fitting a linear regression of the GC content on the chromosome length. What is the estimated decrease in %GC per Mb increase in chromosome length?

Challenge 2: use the techniques of this tutorial to compute the gene density on each chromosome - that is, the proportion of each chromosome that is in genes. Now repeat the linear regression including the gene density as a covariate. Does gene density 'explain' the relationship?

How does this play out for other species?

Good luck!