A Problem with Confidence Intervals

Jackson Curtis
Jun 20, 2022
4 min read

We actually need confidence hyperspheres

I've been thinking a lot about confidence intervals and how to effectively communicate them. One thing I've been particularly stuck on is this somewhat paradoxical statement:

As model complexity (# of parameters) increase, we get more confident in the results while getting less confident in the parameters.

This statement is quite obvious in the extreme cases. Modern deep learning models produce great output using trillions of parameters, and tweaking any of the parameters would result in absolutely tiny changes to the performance of the model. No one parameter really has or needs any "correct value" in a model that large.

This is a little more troubling in models built to help us understand the world around us. At Recast we help marketers understand the performance of their marketing budget to optimize how they spend their money. While we can (and do) use our model to run optimizations that pop out with answers that say "this is how you should spend your money," we also work hard to help them understand what the model is telling them in terms they can understand: return on invest (ROI), cost per acquisition (CPA) and more.

We can calculate confidence intervals on all these quantities of interest that marketers have, but I've found communicating the uncertainty in a model is much harder than slapping an error bar on a chart. Let me demonstrate with an example. In my example code I use simple simulated linear models (not the more complex models we use at Recast), but the issues will hold for any kind of statistical model.

Some of the important outputs of a marketing mix model are the estimates of ROI, or how many dollars you receive in revenue for every dollar spent in a marketing channel. For many businesses, it's normal to expect most channels to have an ROI between 0x (totally wasted spend) and 5x ($5 revenue for every $1 spent). In the simulated toy example below, we went out and collected data on our advertising spend (in three different marketing channels) and our revenue, and these were the ROI estimates we got back:

How disappointing! We went out and gathered all this data, spent all this time building our model, and learned almost nothing about Channels 1 & 2. The confidence intervals are almost as big as our going in assumptions (ROIs in the 0-5x range). But did we actually learn nothing? And why were we able to get such a more precise estimate of Channel 3? In this case, our large confidence intervals are caused by extremely high correlation in the spend numbers for Channels 1 & 2 (correlation 99%).

Why does high correlation lead to wide confidence intervals? Because the two channels have such similar inputs, it is hard to distinguish whether the effect is coming from the first channel or the second. For example, if the first channel has a large effect and the second channel has a small effect, the dependent variable will look very similar to a dependent variable where the second channel has a large effect and the first has a small effect. When input data is correlated, the uncertainty in our point estimates will also be correlated, but that's completely lost in a univariate plot of uncertainty!

Instead, we can calculate a "confidence region" of values where we find the ROI estimates of Channel 1 & 2 plausible:

Here we see that the general story of the first plot holds true: Channel 1's ROI estimate is between 1x and 4x, and Channel 2's ROI estimate is between 3x and 6x. The problem is not that either of the confidence intervals are wrong, but that when people see univariate confidence intervals they don't wonder "hmm, I wonder if that uncertainty is correlated." Instead, they imagine something like this:

While the initial confidence intervals may have led us to believe that Channel 1's ROI could be 3.8x and Channel 2's ROI could be 5.5x, when we look at the confidence region we see that particular combination is completely infeasible according to our model.

In fact, if the goal of a confidence interval is to communicate a plausible range of values for a given parameter, you could argue that including the univariate confidence interval is more misleading than not including intervals at all. There are many more implausible parameter values in the "perceived region" than there are plausible values, so you may actually be leading people astray by showing them the univariate confidence interval. The end result is making the consumer think that there is less certainty in the results than there actually are!

I don't have a good solution for this. Obviously the confidence regions are "the right way" to go, but they don't really scale as you get up into 10, 30, 100 dimensions (parameters). The univariate regions are technically right, but they lack important information. Alternatively, you can provide confidence ranges on combinations of parameters (e.g. "What will my ROI be if I use this budget?") rather than individual parameters, but those don't help the end user build intuition about individual values or how precise they are. If anyone has any suggestions for effectively communicating the true uncertainty of a model, and helping users of the model understand that uncertainty, leave a comment!

A Problem with Confidence Intervals

We actually need confidence hyperspheres

Recent Posts

Comentarios