Distributions
In statistics, the way uncertainty is accounted for is through probability distributions.
Suppose we are interested in a variable . A distribution for is just a function that distributes a total of 100% probability 'mass' over the possible values of .
For example, here is the distribution of the roll of a normal 6-sided die:
The die has possible outcomes , and the function assigns probability mass to each of them. Across all possibilities it adds up to (i.e. total probability.) Simple, right?
Simple distributions like this one can often be worked out just by counting possibilities. For example, let's warm up with an exercise:
Question
What is the distribution of the sum of two dice rolls? Use a pen and paper to draw it.
Hints
This can be worked out just by counting the possibilities. For example, there are possible dice rolls that generate the value , and there are possible dice rolls in total. So there ought to be chance of rolling a .
For a continuous variable things are slightly more complicated because to get an actual probability we typically have to think in terms of ranges of values. For example, here is the density of a normal distributed variable:
Like the dice example, the function sums to over the whole real line. As shown on the plot, to read off the probability that is in any given part of its range, we just sum up (integrate) the density over the range. (So the density function itself isn't returning a probability at each point - the probability of any specific value is vanishingly small) - but something that sums up to a probability over any range of values).