The Dirichlet is a Mixed Bag (of Nuts)

You've probably come across the Dirichlet Distribution if you've done some work in Bayesian Non-Parametrics, clustering, or perhaps even statistical testing. If you have and you're like I was you may have wondered what this magical thing is and why it gets so much attention. Maybe you saw that a Dirichlet process is actually infinite and then wondered well how is that going to be useful?

I think I've found a very intuitive approach... it's at least quite different from any other I've read. This post requires that you already be familiar with Beta distributions, PDF functions, and Python to get along. If you meet that requirement then grab a bag of nuts and let's jump right in.

Motivating Example

Imagine you have a bag of mixed nuts that contains five different types of nuts. You reach your hand in and pull out a few nuts... you can't help but wonder. Did you get five almonds and one walnut by chance or is the bag poorly mixed?  If the bag of nuts is properly mixed then each handful should be fairly similar to every other handful. If the bag is mixed poorly though, the nut mix would be different in different regions of the bag.

Here's a visual of a well mixed bag of nuts (transparent so you can see inside)

Here's a visual of a poorly mixed bag of nuts. Notice how there are clusters in the bag that are largely just one type of nut?

Let's start by simplifying the problem. What if the bag only had two types of nuts? Walnuts and almonds. You randomly pull 6 nuts out of your bag and note how many of each nut type you have. How likely do you think it is that your bag is well mixed? 

Modeling our problem

Really we want to know if the nuts we've pulled from the bag are each equally likely to be chosen. Since we only have two nuts, this is equivalent to looking for a fair coin flip and you can reference past articles on how to model that. I'm going to assume that you know that this can be modeled with a Beta distribution. We can then compare the actual mix of the nuts we pulled and see how likely it is that we get a walnut ~50% of the time (let's just call that "fair enough"). We're just going to have a couple hypotheses to evaluate for simplicity's sake so we're going to evaluate things very coarsely. For the sake of this post, try to forget about that.

Because we have two types of nuts we have a classic problem for a Beta variable. We draw walnuts and NOT walnuts or you can view it as almonds and NOT almonds.

We start with our uninformative prior
B(1, 1)

Each element of the array maps to a distinct nut type. Let's pretend the left one is walnut and the right is almond. Let's update our random variable to reflect that we drew one walnut and five almonds:
B(1+1, 1+5)

In this article I'll use distributions and pdf functions interchangeably so we can focus on what's really important the type of random variable that's backing the function/distribution. Also B represents a random Beta variable and D represents a random Dirichlet variable.

Now what are our hypotheses? We're going to keep this simple so that we can focus our attention on the Dirichlet distribution and gaining an understanding for it. Let's say these are:
  1. 80% walnuts, 20% almonds
  2. 50% walnuts, 50% almonds (fairly mixed)
  3. 20% walnuts, 80% almonds
Now! We can evaluate the pdf function against each hypothesis, then normalize the set of pdf results to come up with probabilities!

I end up with
  • 0.3% chance for the mix being 80% walnuts and 20% almonds
  • 19.2% chance for the nuts being fairly mixed
  • 80.5% chance for the nuts to be 20% walnuts and 80% almonds
After just that one handful, these results make intuitive sense to me. Now. What if we take another handful and get 2 more walnuts and 3 more almonds? We would model this as B(1+1+2, 1+5+3) = B(4, 9)
  • almost 0% (0.07% to be exact) chance for the mix being 80% walnuts and 20% almonds
  • 26.7% chance for the nuts being fairly mixed
  • 73.3% chance for the nuts to be 20% walnuts and 80% almonds
Now here's the trick. We can also use the Dirichlet to model this problem in exactly the same way and double check that we get the same results. How?

The Beta B(4, 9) is equivalent to the Dirichlet D(4, 9). Here's the code that will print out the same probabilities as above:

Ok. So if they were both the same this wouldn't really matter. We'd just use the Beta distribution all the time and not worry about it. Here's where the Dirichlet comes in handy... What about if we had a third type of nut in our bag? CASHEWS!

Generalizing the Beta

Now. Let's imagine we reach into our bag of nuts and pull out 
  • 1 walnut
  • 5 almonds
  • 2 cashews
Now! What is the probability this is taken from a bag with an even mix of nuts?
Since we're using the Dirichlet distribution this is super easy. We just add an extra parameter into our data. 

Think of it like this... Before I had a Beta distribution and I could look at it two different ways:
  • The probability of getting a walnut given how many nuts I had drawn already
  • The probability of getting a almond given how many nuts I had drawn already.
So.. If a single beta really just represents probabilities of getting one specific nut type, what if we had a second Beta for the other nut type? Rather than try to use multiple Beta variables, we could model our first problem in this way.

Remember a Beta is parameterized as B(alpha, beta).

B_A = B(1+5, 1+1)  = D(1+5, 1+1)
B_W = B(1+1, 1+5) = D(1+1, 1+5)

Notice how the Dirichlet looks exactly like the beta?

Now. Let's say that we had a third nut type and we just didn't draw it from our bag. How would we model the above with an extra nut type? Well a Beta only handles two types of nuts so we have to leave it behind and leverage the Dirichlet. Try to take a shot at following the pattern we've established to model the problem before reading on.

Ready? This is how we'd model the problem if there was another type of nut that we just happened to not draw:
D(1+5, 1+1, 1+0)

We just tacked on a new element in our Dirichlet. What if we had a fourth nut type we also happened to have just not drawn in our example?
D(1+5, 1+1, 1+0, 1+0)

And so on. Really, you can just think about the Dirichlet in our situation as 
D(1 + Walnuts pulled, 1 + Almonds pulled, 1 + Cashews pulled)

Each nut type can just be appended to the Dirichlet.

Let's get back to the problem at hand and create a few hypotheses assuming there were three different types of nuts in the bag. Let's keep those simple still and just use these as hypotheses (walnuts, almonds, cashews):
  • D(33%, 33%, 33%)
  • D(80%, 10%, 10%)
  • D(10%, 80%, 10%)
  • D(10%, 10%, 80%)
Remember we pulled 1 walnut, 5 almonds, and 2 cashews. We can model this as a Dirichlet with the following random variable:
D(1+1, 1+5, 1+2)

We run this scenario through our code like so:

And we get these probabilities
  • 32% chance of Mix(33%, 33%, 33%)
  • 0% (actually 0.02%) chance of Mix(80%, 10%, 10%)
  • 68% chance of Mix(10%, 80%, 10%)
  • 0% (actually 0.13%) chance of Mix(10%, 10%, 80%)
We're leaning towards our mix being skewed towards almonds. Just like with the Beta, we can take another draw and update our beliefs. Let's say this time that we pulled 1 walnut, 4 almonds, and 4 cashews. That would mean our Dirichlet would be modeled like so:

D(1+1+1, 1+5+4, 1+2+4)

I'm just keeping each set of observations for each type of nut separate to be as transparent as possible. The probability of each of our hypotheses is now
  • 85% chance of Mix(33%, 33%, 33%)
  • 0% (actually 0.02%) chance of Mix(80%, 10%, 10%)
  • 15% chance of Mix(10%, 80%, 10%)
  • 0% (actually 0.13%) chance of Mix(10%, 10%, 80%)
Do these results make sense? Now that we see the number of cashews and almonds isn't nearly as different as it had been, it could be seen as pretty reasonable that those types of nuts are evenly distributed. Walnuts still seem very unevenly distributed though. Our current hypotheses don't capture that there could be a single nut type that is under-distributed. It really only models a fair mix or a single nut being over distributed. Given that, the completely fair mix does make sense. Extending this to include more hypotheses to get a more common sense answer is an exercise left up to the reader. 

We can keep generalizing from here to 4 nut types, 5, etc. by just adding on a count for the new nut type to the end of the list inside our Dirichlet.

Going Further

In the interest of posting more frequently, let's wrap up. We covered how similar the Beta and Dirichlet random variables are in the case of binomial outcomes. After that we extended to tri- and N-nomial outcomes using the Dirichlet and were able to evaluate the mix of nuts we drew from our bag. I've left out some details in order to focus on what matters most. Hopefully you're now in a position to do some more research and connect the few remaining dots.

The next blog post, will connect what we covered here to running non-parametric statistical tests using the Bayesian Bootstrap. If this sounds like a not completely obvious leap then you are exactly who I'm writing these for. Past me. 

This is a terse notebook I created while I was working on this blog post. If you're having any trouble getting your code to run with the same results check it out.