Some useful probability distributions
In practice in statistics we often use a set of well-known distributions. These have specific mathematical forms that come with parameters to make them flexibly useful.
Here are some of the commonly-used ones:
Name | Domain | Expression and R function | Parameters | Explanation |
---|---|---|---|---|
Binomial | dbinom() in R) | Number of 'trials' 'Success probability' | How many 'successes' from trials? | |
Normal or Gaussian | dnorm() in R) | Mean Variance | Ubiquitously useful | |
Beta | dbeta() in R) | 'Shape' parameters and | E.g. allele frequency estimates |
If you don't understand the maths above, don't worry. You can understand these distributions by plotting what they look like as we'll do below.
Normalising constants
Many of these mathematical expressions have complicated-looking bit at the front that doesn't depend on . For example - the normal distribution has this bit:
while the beta distribution has this bit:
(here is the 'beta function').
This bits can look complicated but the don't depend on . In fact, they are just normalising constants: their purpose is to ensure the distribution sums to over all the possible values of .
Question. However, the expression in front of the binomial isn't a normalising constant in the same why - why not?
Binomial distribution
Challenge
Pick a number of trials n (start between 5 and 20) and a probability (start between 0.1 and 0.9). Then plot the binomial distribution over the range of integers .
The binomial distribution is given by dbinom()
in R, and can be used like this:
dbinom( x, size = n, prob = p )
How does the shape of the binomial differ as you vary and ?
Note. The expression for the binomial distribution is:
Here means 'n choose x' - the number of ways of choosing x things from n things - which can be computed using choose(n,x)
in R.
For extra kudos, plot this using your own function binomial(x, n, p)
implementing the above formula.
Normal distribution
Challenge
Pick a mean value (start somewhere between and ) and a variance (which must be positive - for example, is a good starting choice). Then plot the density of the normal distribution over the continuous range .
Note. the normal distribution density is given by dnorm()
in R, but you have to specify the standard deviation (i.e. the square root of the variance) instead of the variance:
dnorm( x, mean = mu, sd = sqrt(v) )
How does the distribution differ as you vary and ?
For extra kudos, ignore dnorm()
and write your own function normal()
to compute this based on the normal distributino density formula:
Beta distribution
Challenge
Pick 'shape' parameters and (make them between 1 and 10 to start) and plot the beta distribution:
over the (continuous) range .
Note. The beta distribution is implemented as dbeta()
in R, but you have to use "shape1" and "shape2" instead of and :
dbeta( x, shape1 = alpha, shape2 = beta )
How does the shape vary as you change the parameters? What happens if they are less than 1?
For extra kudos, ignore dbeta()
and write your own function beta_distribution()
to compute this based on the normal distributino density formula:
(You can use the beta()
function to compute the value on the demoninator.)