As I’ve started learning more about Bayesian statistical methods the biggest light bulb moment for me was when I started thinking in terms of distributions. I was raised on the notion of single number answers. Because of that, I thought I had mastered Bayesian methods when I could recite P(A|B) = P(B|A) P(A) / P(B). WRONG.
Today’s algorithm will show a simple example of how to deal with probabilities as distributions. There are two key concepts: updating our distribution with data, and normalizing the distribution.
First allow me to introduce today’s concrete example: Flipping a coin!
Chance of Heads
If you flip a coin and it comes up tails 3 times out of 4 how likely is it your coin is actually a fair coin? Let’s say fair means 45-55% of the time it comes up heads.
The chance it could come up heads could be anything between 0%-100% right? 0% would mean it’s completely unfair and somehow NEVER comes up heads. 100% would be precisely the opposite. We’re hoping for somewhere in the middle.
Now before we’ve flipped the coin we need to decide what the probabilities of our hypotheses between 0% and 100% are. To keep it simple we’re going to assume that each hypothesis is equally likely. This is known as a “uniform distribution.” In python that looks like this:
possibilities = [1.0]*101
Now we need to normalize this so it represents probabilities. For that, I wrote this function:
def normalize(possibilities): possibility_sum = sum(possibilities) for hypothesis in xrange(0,101): possibility = possibilities[hypothesis] possibilities[hypothesis] = possibility/possibility_sum
Basically this just makes sure all of the likelihoods for our hypotheses add up to 100% like good little probabilities.
This is what this uniform distribution looks like visually:
Updating with new information
So now we flip our coin and let’s say we get heads. Let’s think through what we expect to happen to that uniform distribution above. Starting from 0% and working right…
0% Now that we’ve gotten heads once we know that it’s impossible to never get heads. This goes to zero.
1% The chances seem very low that if there’s only a 1% chance of getting heads that it’d be the first we’d run into.
50% Completely possible. We’ve only flipped once so we can’t have 50% probability. This should be somewhere in the middle.
100% Well, we’ve only flipped heads once so we can’t say this is certain but we also can’t rule this out.
So how do we update our data in python?
We use this function when we flip heads:
def update_success(possibilities): for hypothesis in xrange(0, 101): likelihood = possibilities[hypothesis] possibilities[hypothesis] = likelihood * hypothesis/100.0 normalize(possibilities)
And this function when we flip tails:
def update_failure(possibilities): for hypothesis in xrange(0, 101): likelihood = possibilities[hypothesis] possibilities[hypothesis] = likelihood*(1.0-hypothesis/100.0) normalize(possibilities)
They are different in one subtle way. Look closely at the fourth line of each function.
On the success function we are multiplying the likelihood by each hypothesized probability of us getting heads. Once we’ve done that for every hypothesis we normalize to get back to a percentage. The failure function multiplies the likelihood of each hypothesis by the probability of the hypothesis being false (according to the hypothesis).
This leaves us with the following histogram of likelihoods for each hypothesis:
That’s all of the moving parts. Let’s walk through some more examples so you can convince yourself this works.
Baby stepping through our scenario
Next let’s say we flip tails and think through what we expect to see. We’ve already ruled out 0% chance of seeing heads and now we can rule out 100% chance of seeing heads. We do now have one of each though, so I expect the middle of the distribution to swell a bit. Whether it’s a triangle shape or a curve, I’m not sure but something like that.
Here’s that distribution:
It’s curved! Of course! Well… It’s not really obvious, but it does seem rational at least.
Now let’s say we flip two more times and they’re both heads:
Getting to an answer
Let’s stop here and answer the question I initially asked. How likely is this coin to have a ~50% chance of flipping heads (where that really means 45%-55% chance since statistics is an inexact science)? Here’s how we can approximate the answer. From the histogram we have, we just need to add up the probability from each hypothesis.
45% 0.68% 46% 0.73% 47% 0.78% 48% 0.83% 49% 0.88% 50% 0.94% 51% 0.99% 52% 1.05% 53% 1.11% 54% 1.17% 55% 1.24% Adding them all up equals a 10% chance.
Notice how each individual hypothesis has a really low likelihood? It being exactly 50% is only a .94% chance! What’s that about?
This is actually a really cool aspect of this method. You see anytime you deal with probabilities (even in frequentist statistics) you need to deal with ranges of likelihoods. Call them a confidence interval, credibility interval, highest density interval, etc. It’s the same concept.
Also, this whole idea of treating individual probabilities as hypotheses was life altering for me. This concept applies generally to anything where you want to evaluate likelihoods. The y-axis will always be probabilities but the x-axis can be any numerical value. Be it money, time, or what have you.
Actually since I’ve mentioned credibility intervals, let’s get that from this data as well.
The simplest way to explain a credibility interval is to say it’s essentially a confidence interval that you can easily get from a histogram. It entails saying which hypotheses encompass 95% of the likelihoods you have.
I’m running short on time so we’ll just estimate it. I’ve grabbed the data from the program and put it in a spreadsheet. My next step is to sort the data by its likelihoods in descending order and grab the first N values that add up to 95%. Then I need to find the minimum hypothesis in that range and the maximum.
I’ve done that for our experiment. 95% of the likelihood falls between 42%-98%. We can interpret this to mean we can’t eliminate the possibility that our coin may still come up heads 50% of the time… but then we also have some confidence our coin will come up heads greater than 90% of the time. Really this interval is so wide, it really should be interpreted to mean that we need more data.
Wrap it up
That’s the algorithm. It’s extremely straight forward, but those ~12 lines of code took me a while to truly understand. If you want to learn more I suggest reading about the binomial distribution. I think you might see some similarities between that and what we did. Read more here.
Also here’s my full code file in case you want to dive into the code to understand this deeper:
def normalize(possibilities): possibility_sum = sum(possibilities) for hypothesis in xrange(0,101): possibility = possibilities[hypothesis] possibilities[hypothesis] = possibility/possibility_sum def update_success(possibilities): for hypothesis in xrange(0, 101): likelihood = possibilities[hypothesis] possibilities[hypothesis] = likelihood*hypothesis/100.0 normalize(possibilities) def update_failure(possibilities): for hypothesis in xrange(0, 101): likelihood = possibilities[hypothesis] possibilities[hypothesis] = likelihood*(1.0-hypothesis/100.0) normalize(possibilities) # set every possibility to be equally possible possibilities = [1.0]*101 # Coin flip, probability of heads normalize(possibilities) update_success(possibilities) update_failure(possibilities) update_success(possibilities) update_success(possibilities)