Deep in the Weeds: When Testing is Slow and Expensive

Controlled testing in modern systems is fairly straight-forward. There are many tools to handle statistical analysis, random population sampling, data collection, etc. With the number of web visitors routinely numbering in the millions, the statistical techniques are also greatly simplified because of the preponderance of data. What do we do though when data is very costly?

In this post, we'll take an overly simplified model and solve it using response surface modeling to find an approximate optimum with very little data. At the end of the post we'll discuss some possible caveats and some ideas for getting around those caveats.

Motivating Example

Imagine that a brick and mortar store owner wants to experiment with different prices on visitors to her store. Since we're going VERY simple, let's just assume shoppers are just as likely to visit on Wednesday as they are Saturday and that the visitors to her store are fairly homogenous in terms of their purchasing behavior. 

Our first thought may be to run a controlled experiment with each visitor getting their own pricing. In practice though the store owner tells us that this would be impossible. How would each customer be told prices different from the others? Also, if she were able to do that, many customers would end up quite upset. Instead, what if we proposed changing the prices each day? This allows her to change the prices on the shelf to match what we want to experiment with and it also means each customer will be treated the same as others shopping at the same time. If a customer comes back later in the week they will see different prices but she feels comfortable that that won't set off any big alarms in her customers.

Now we have the makings of an experiment. Each day we'll move the prices by some X% and we'll see which change has the largest effect on customers. Since just measuring purchases doesn't tell us anything about how much money we are making, she decides we should use her net revenue each day.

Analyzing Data

She comes back and says she was able to run the experiment for 7 days and she tried a different price adjustment each day. Now that we have data we can discuss what exactly response surface methods are. First, let's chart the data:

The data is pretty noisy. At 85% of normal prices we see a believable reduction in our net revenue but on the other end of the spectrum, it's a mixed bag. How can we make the most of this data?

Since we don't have very much data and since it's pretty noisy, what if we decided to just simplify our analysis and assume a mathematical model? Conceptually, we can intuit that while lowering prices too much will increase sales it will also mean that we take a loss on some sales. If we increase prices too much we'll make more money on each sale, but customers will buy less. One of the simplest models that we can fit to this is a parabola. By projecting a parabola on top of this data, we can use the data holistically to model where we think the optimum will be. 

This is the same data but with a parabola fit to it.

Next we can solve the equation of the parabola for the maximum which in this case is about 101.5%! She should raise her prices a bit. If you're not sure how to solve for the maximum on the parabola here's Wolfram Alpha to the rescue:


So how did this help us over other approaches? Well a simple approach might be to set a baseline at 100% then choose another price point like 105% and run that until there is "enough data" to make a decision. The challenge with this approach is that after you establish your baseline and an extra point you've spent a significant amount of time and still can't tell where the optimal point may be. If you choose a third point, you then have to evaluate that for several days and see if it's better or worse than 100% and so on.

The data for this motivating example was simulated/generated from a random process that was molded to fit this scenario. As I've discussed in the past, one of the beautiful things about simulations is we can test our new methods against a known solution. So! What was the optimum/true solution that was used to generate this data?

The true maximum was at 105%.

So while our methodology may not have gotten it 100% correct, the additional 1.5% shift in prices nets us slightly fewer orders but enough extra net revenue that we make the money to cover that opportunity cost and then some!


That's really it. It's probably over-simplified but I hope it was helpful to show how we can take simple mathematical models and use them to create a holistic inference. There's still more to cover in future posts. Using these ideas we can look at inputs of many dimensions. We can also take our analysis here and with a bit of work construct a bandit approach to choose which prices to test each day. This technique would allow us to spend our time trying prices that are most likely to make us the most money so that we optimize the total money spent even during testing.

That's all for a future post though. Thanks for reading!