5 Problems with Using Linear Regression for MMM (and what to do instead)

Jackson Curtis
Jun 2, 2022
7 min read

Marketing Mix Modeling (MMM) is a strategy to optimize a company's marketing spend to maximize sales for a given product (or products). You can think of a "marketing mix" as a collection of all the different places a company could spend money on advertising. Marketers have to decide (A) which marketing efforts (e.g. commercials, billboards, Facebook ads, etc) go into their mix and (B) how much to spend on each channel. Marketing mix modeling is an attempt to measure the relationship between the money spent on each channel and the outcome of interest (usually revenue). If we knew the relationship between a dollar spent in some channel and the revenue generated, we could calculate the return on investment (ROI, also called ROAS - 'return on advertising spend'). Understanding ROI is foundational to good marketing strategy.

As a statistician, hearing that you need to understand the relationship between a handful of numeric variables (dollars spent in different channels) and a single numeric value (revenue), it seems totally natural to think of regression. Indeed, almost all companies start their MMM efforts with some simple regression models. However, you won't get very far down the MMM rabbit hole before you realize that there are a lot of problems with trying to force an MMM into the assumptions of a linear regression. I want to highlight a couple of these issues, and highlight some modeling strategies to address these problems. At Recast, we've spent the last couple years iterating on these models, starting with some basic regression techniques, and evolving to address more complex issues. These are by no means the only way to address these issues (Facebook's Robyn addresses many of the same problems in different ways than we do), but we've found them to work well in practice.

Problem #1: Your ROI estimates will be negative

Consider a simple model where each predictor variable is how much you spent in each channel, and the dependent variable is total revenue. Then each regression coefficient is the ROI for that channel. A really effective advertisement might have 5x ROI, an average ad might have 2x ROI, and a lot of ads will have <1x ROI. The problem is when you have a lot of channels and not a ton of historical data, each of those coefficients you estimate are going to be noisy. That means some of the coefficients are going to be negative. That's fine, as a statistician you understand that parameters aren't exact and just because the sign is negative doesn't mean the ad is actually costing you money. But your boss doesn't. As a non-statistician, a negative ROI is unfathomable. Presenting those results is going to cause stakeholders to (A) dismiss your model as non-sensical or (B) focus on 'fixing the problem' and massively overreact to the channel that is 'driving away customers.'

At Recast, we solve this problem by using a Bayesian model. Bayesian models use prior beliefs as input. In this case, our prior belief is that "all press is good press." We can use the prior to make the probability of a negative ROI zero or extremely low, thus helping make the model more coherent and interpretable.

Problem #2: Your ROIs will be static through time

To build a regression model, you're going to need historical data. The problem is, how do you know what data is valid for your ROI estimates? How do you know when it goes out of date? Consider an extreme example: when Apple rolled out IOS 14.5 that allowed users to opt out of tracking, Facebook lost a major tool in its ability to effectively advertise to their customers. It seems unreasonable to assume ROI from Facebook advertising is the same last year as it is this year. Maybe in this scenario you just need two ROI estimates, but other situations aren't so dramatic, and aren't so clear cut. Maybe as more and more older people get on TikTok, your advertisements are more effective there. Maybe your TV ads aren't as effective because the shows they are on are getting less popular. Linear regression is going to average these differences into a single estimate of ROI that doesn't represent anything real or actionable.

At Recast, we overcome this limitation using something called a Gaussian process. Gaussian processes are an extremely complex statistical topic, way beyond the scope of this post, but using a Gaussian process we are able to let ROI estimates vary over time, with two important constraints. (1) ROIs close in time (say one week before another week) must be similar. And, (2) ROIs close to the same time of year follow a similar pattern. By enforcing closeness in time we allow ROIs to drift up and down as the model is able to estimate when ROIs are trending up and when they're trending down. The yearly constraint allows us to add in seasonal effects, which captures effects like "advertisements leading up to Black Friday perform well on Google."

Problem #3: You don't know the delay between spend and revenue

So far, we've been vague about what our regression terms are: the predictors are "spend" and the dependent variable is "revenue," but what do we mean by that? Ultimately we have to calculate spend as "spend in a certain time frame" and revenue as "revenue in a certain time frame." The problem is, the spend is likely to effect revenue in a different time frame than the one you're using! Suppose you spend money on ads in the first week of the year. Would you expect that to juice sales in the first week, the second week, the third week, or maybe all three? It's even more complicated by the fact that some channels are more fast acting than others. You might spend a bunch of money to mail flyers to people and not see revenue go up for a month, whereas Google search ads may up your revenue on the same day! A linear regression is going to require you to do a lot of guesswork in figuring out how to relate the spend and the revenue across time.

At Recast, we solve this issue by estimating a time shift for each channel. The model does two things: (1) estimate the total revenue you will receive...eventually! And (2) estimate how spread out over time that revenue will be. We do this by estimating a negative binomial distribution for each channel where the x-axis is the number of days since the money was spent and the y-axis is what percent of the total revenue you will receive will come in on that day. Because probability distributions sum to one, this means for any negative binomial distribution, you will eventually receive 100% of the revenue. This gives us the ability to measure channels with fast returns (like Google search) and slow returns (like mail campaigns) and account for how long it takes to see the revenue.

Problem #4: Your ROIs will be inaccurate as you spend more

If your MMM regression was just Y = A*x1 + B*x2 + C*x3 (where you have three channels), the profit maximizing optimal spend would be really easy: find the channel with the biggest coefficient, and spend all your money there! That is because in a basic regression the coefficient doesn't change depending on the size of x. However, that is not at all how reality works. In marketing, a channel becomes "saturated" meaning as you spend more and more money you start to either (1) reach the wrong audience (who's not interested in your product) or (2) reach the same audience too many times. This is why you need a "media mix" to ensure you're hitting the right audiences at the right times through multiple different channels.

To make sure your model estimates saturation, you need non-linear terms in your model. At Recast, we use a specific type of non-linearity called the hill function (which you can read more about under 'Diminishing Returns' here). Simply put, the hill function is a special non-linear function that ensures that ROI is strictly diminishing, meaning that your first dollar spent is your best, and it's all down-hill from there (eventually your spend will net no additional revenue).

Problem #5: Your predictions will be horrible during holidays

A basic linear regression is going to assume normally distributed errors. However, many real world events can cause sales not to vary by two or three standard deviations, but 10x the typical value. Holidays and promotional events are regular occurrences that can cause dramatic swings in the dependent variable. You might want to use indicator variables to mark these holidays as special events, but that might not be as straightforward as you hope. That's because promotions and holidays often have cannibalization effects. If consumers know a promotion is coming, they might delay their purchase until the sale starts. Alternatively, customers might rush a purchase early if they know a sale is ending. This means even if you use indicator variables for special days, the predictions before and after your holiday might still be extremely bad.

Any approach to address this needs to incorporate the temporal nature of the data and account for the correlation between days. At Recast, whenever we have an event that causes sales to spike we use two gamma distributions (one positive and one negative) to model the shape of the spike. The result is a shape similar to the one below, where the size of the peak and the size of the pre- and post-peak cannibalization are all flexible and estimated from the data. These spikes let us fit the existing data much better as well as forecast future holidays from past years data.

Conclusion

Whenever you try to model a real-world process, you're going to run into real-world problems. Linear regression is a fantastic mental model from which to build off of, but being able to adapt and come up with novel frameworks that align well with the real-world phenomena is where the real fun/challenge is in statistical modeling. If you have a MMM or similar problem, I hope this post can spark ideas for how you can improve your models.