Why Bayesians Prefer Log Space

Why log space? Read through enough tutorials on Bayesian statistics and you're sure to encounter what seem to be unnecessary or at the very least confusing use of log and exp. 

Let's go over some examples to understand why log-space and also how.

Why

Because Bayesian statistics is all about probabilities we use multiplication like crazy. We're constantly multiplying the likelihoods of different data points together to evaluate the probabilities of different hypotheses. Get enough data and the likelihood of any given set of data can become vanishingly small.

How many multiplications can we do before this happens?

According to this code we could multiply 0.5 by itself 1,075 times before the result becomes exactly zero. Now, how about log-space?

I ran that until I got impatient and stopped it. It got to over 37 million multiplications and the log-space value was only -25,783,564. Plenty of room until we hit negative infinity! That's the power of log space. 

Here's the catch. Once you do all of this multiplication, you still need to convert it back from log space. But how do we do that if these numbers are so small that they'll be zero once we convert them back?

Coming Back from Log Space


So how do we come back from log-space? In our case, remember that we're working with weights that will become probabilities once we normalize our data. 

This is easier with an example. Imagine that we have a set of data we've been multiplying in log-space and have these weights as a result:

[-19285006, -19275006, -19275003, -19275002, -19275001]

In Python all of these values are exactly zero:
np.exp([-19285006, -19275006, -19275003, -19275002, -19275001]) = [0, 0, 0, 0, 0]

Here are the steps to log-normalize:
  1. Find the largest number
  2. Subtract it from every number
  3. Convert out of log-space and proceed like normal
By subtracting the largest number from all of the numbers we're making all of the numbers larger and since it's subtraction it is equivalent to non-log-space division. If we do that we get the following values when we undo the log-transform.


Then by removing the log transform with np.exp(scaled) we get
[ 0., 0.00673795, 0.13533528, 0.36787944, 1.]

Which we know how to get back to probabilities (just divide each by the sum of the numbers).
[ 0., 0.00446236, 0.08962882, 0.24363641, 0.66227241]

Notice how one of the values still hits exactly zero. Now that we have normalized the whole set of numbers though, that is fine. The reason being that relative to our other numbers, this one is so close to zero that zero is probably a perfectly acceptable approximation. 

Conclusion

That's it. It's an interesting math hack that is surprisingly simple once it turns into a simple subtraction problem. Try it with several different cases yourself. I've seen this technique several places and this is the simplest I could get it. There may be another way to come at this so it makes even more sense. Either way, hopefully this helps!