Skip to main content

Some population-genetic statistics

If you followed the basic simulation section you'll know that random inheritance patterns in the population cause haplotypes to be lost over time, and others to increase in frequency. As a result genetic diversity is lost.

Key popgen metrics

Let's use hih_i be mean the haplotype of individual ii, and hi(l)h_{i}(l) to mean the allele carried by haplotype hih_i at SNP ll. And I()I(\cdot) will denote the indicator function, which is 11 or 00 according to whether the condition is true.

Two key measures of diversity are:

  • The heterozygosity HH. This is the probability that two individuals drawn at random carry different haplotypes:
heterozygosity:H=1number of pairsi,jj>iI(hi=hj)\text{heterozygosity:}\quad H = \frac{1}{\text{number of pairs}}\sum_{i,j|j>i} I(h_{i} = h_{j})
  • The nucleotide diversity. This is often denoted π\pi, and is the average number of genotype differences between two samples, where the average over all pairs of samples in the data. It is usually denoted π\pi.
nucleotide diversity:π=1number of pairsi,jj>il=1LI(hlihjl)\text{nucleotide diversity:}\quad\pi = \frac{1}{\text{number of pairs}} \sum_{i,j|j>i} \sum_{l=1}^L I(h^i_l \neq h_{jl})