Google Hasn’t Killed Attribution Modeling – It Never Really Worked To Begin With

"Data-Driven Thinking" is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Nico Neumann, assistant professor and fellow, Centre for Business Analytics at Melbourne Business School.

Following Google’s announcement to no longer share user IDs externally, some industry voices have raised concerns about how this move would harm independent multitouch attribution (MTA).

Indeed, it is hard to deny that any attribution analysis, which attempts to determine the efficiency of different ads, would lack a big piece of the puzzle without Google’s data.

However, how useful was attribution modeling even before this happened?

The customer journey was never completely trackable

Attribution modeling requires data on every user exposure to ads before a conversion occurs.

While such touch point records in the form of web cookies used to be readily available for many digital ads in the early days of online marketing, touch point data from traditional channels, such as TV, out-of-home, radio and print, has always been harder to access.

Beacon technology for mobile phones and TV set boxes create new opportunities to track individuals, but they require complex integrations to link lots of different identifiers into one meaningful list. While this is technically feasible, it creates extra costs and is rarely done well.

In any case, a complete user touch point recovery across competing vendors that tend to avoid supporting each other is not the only challenge for advertisers. Google is not alone in its decision to exclude cookie IDs in downloadable files in the name of privacy: Other advertising powerhouses, such as Amazon, Facebook and Twitter, don’t share any of their identifiers either.

Hence, it is fair to say that clients that leverage several of the leading online platforms will always have some key touch point data missing in their attribution models.

Use experiments for short-term effects, not correlational models

Independently of any data issues, there are good reasons to stop using attribution models, even if they are based on algorithmic estimation techniques instead of simple, gameable rules, such as last-touch attribution. The problem with any algorithmic MTA model is that it still relies on correlations and curve fitting and is therefore prone to wrong ad effect estimations.

Research by Google and Facebook and Northwestern University has demonstrated that only well-designed randomized experiments provide accurate insights. Netflix’s Kelly Uphoff shared similar findings at Programmatic IO in 2016: Attribution models will not reveal proper incremental uplifts as they do not allow causal inferences.

Programmatic targeting makes randomization in experiments difficult

To perform a proper experiment, one needs two identical groups, whereby one group sees no ads and the other one is exposed to ads. Then we can compare the two groups and observe the impact of ads. To achieve a robust result, the allocation of prospective customers to each group should be random. In other words, people in each group should have an equal likelihood of conversion before seeing ads.

This is the big challenge in programmatic advertising as demand-side platforms’ (DSPs') algorithmic targeting makes it difficult to allocate groups based on similar conversion probability. Most DSPs are programmed to find customers with the highest probability to convert. Therefore, in principle, any randomization of people needs to occur before targeting happens to have two similar groups for experiments. Unfortunately, only a few platforms allow this in limited markets.

Long-term branding effects represent the biggest measurement challenge

Now we must consider that both MTA and experiments measure short-term effects of advertising, typically looking at time horizons of several weeks. The reason is that most web cookies do not last long enough for longer touch point analyses, while the opportunity costs to have no ads run for half of your prospective customers become extremely high the longer an experiment is live.

However, many ad campaigns have the goal of building long-term brand equity. To measure the long-term impact of different ads, we must fall back to complex econometric models, even though these have many possible shortcomings as they again rely on curve fitting. There are some mathematical tricks to make corrections for known issues, but this is still an area of active research.

While advances in computer power and greater data availability help us continuously improve model performance, the safest way to obtain some insights into the long-term impact of ads is to use a combination of research methods: econometric marketing-mix modeling and non-mathematical techniques. The latter can be as simple as using Google Trends to obtain a proxy of brand awareness and brand recall over time.

Alternatively, one can pay for brand tracker surveys that can be tailored to specific questions and include any brand metric of interest. Just be aware that the sample for any survey-based measurement needs to be large enough to not have random or biased results either. Free sample size estimators provide some guidance here.

Follow Melbourne Business School (@MelbBSchool) and AdExchanger (@adexchanger) on Twitter.

9 Comments

  1. Prof. Neuman unfortunately misses the most important point about MTA...while it cannot accurately answer all the many questions that marketer would like answers to regarding ad performance, it does do one thing EXCEEDINGLY well...it provides ongoing signal of relative performance that says "this seems to be working better than that"...and that is extremely valuable for digital marketers who want to continuously improve their campaigns. Without it, you are truly flying blind...or worse, trusting Google to drive while you remain blindfolded.

    Reply
  2. You make some valid academic points but it’s inaccurate (negligent) to say MTA does not work. While there is no perfect solution, proper MTA is a proven, timely and cost-effective method for identifying relative winners and losers and providing signals for optimizing campaign performance. Despite the noise, non-linear models can predict media performance with sufficient accuracy to reduce waste and improve efficiency. Remember Voltaire: Don’t let perfect be the enemy of good..

    Reply
  3. Dimitris Tsioutsias

    I think Prof. Neuman's point is not one against modeling MTA & tracking digital tactic performance, rather one that looks to answer: against which/whose reference line optimization is truly taking place? One can constantly optimize themselves to irrelevance by raising tactical efficiency and reporting on A/B-test gains (through constant tweaking of tactics "down at the valley") but never achieving long-term value creation (through "trailblazing new mountain peaks"). I don't think it's one vs. the other. Both Macro and Micro causal-modeling of marketing-investment modeling is needed, exactly for the reasons (long- vs. short-term effect) raised in Prof. Neuman's article.

    Reply
  4. In defense of attribution modelling for marketing analysis, it exists because marketers need practical tools to help understand what is going on and make tactical choices.

    Yes of course experiments are superior. Design trumps analysis. Yet as the author himself acknowledges, they are very hard to do in practice. Apart from all the challenges of defining an appropriate control group, there is basic problem that you can only measure marketing effects by fundamentally interfering with your marketing. Plus there are so many different ways to market online that no advertiser can realistically experiment across all them in order to solve the overarching channel mix problem. Which is why none of them do it very often.

    The fundamental point missing here is that last click is itself an attribution model: you are already doing attribution whether you like it or not. What choice do campaign managers on the frontline of digital marketing really have? The reality of a digital marketing up close is checking the CTR and conversion rate on 1000s of ads and keywords every day, and trying to decide very quickly on how to adjust bids and budgets, using some enterprise bid management tools along the way (most of which are geared to last click optimisation). Some 1990's style econometric model which treats whole digital channels as big buckets of spend and uses 3 years of nationwide time series data just does not help. So why not take a look and see if that low performing keyword is doing any better on a first click view? If it is, maybe you can take a bet that its a lead generator and keep spending; but if it’s not maybe it's time to dial down the bids. One day you might do an AB test to sort the issue out. That's how it works in practice.

    Before we get too concerned about the accuracy, we also have to distinguish between positional and rules based attribution models and data driven models. Rules based models never claimed to be 'the truth' but rather were designed to explore the user journey prior to sale and blow open the implicit assumption that 'last click wins' alone defines the contribution of a marketing channel. On the other hand predictive data driven attribution models can be validated: if a data driven attribution model can predict a conversion event, and do so on a blind 10% hold out data set, then it is objectively superior than one which does so at a lower level of accuracy.

    One area where I do agree is around the difficulty of measuring long term brand effects especially for display. Display is much harder to model because of the uncertainty around passive impressions and sales. The 'view through' is probably the worst attribution model there has ever been: stick to modelling clicks. At the same time, good luck with that long term experiment into branding effects - how is that done anyway? Don't advertise in France for 2 years? Basically, how can it be right to say that attribution analysis all about correlation and then go on to recommend econometrics and Google trends, both statistical correlation approaches relying on far far less data than attribution and leaving out information about which ads have been clicked by specific users in combination prior to a sale.

    Analysis and experimentation are two sides on the same coin. Experiments are the gold standard, while analysis scales better. So you analyse for tactical purposes, and use that to help prioritise what to tests to do for strategic purposes.

    Reply
  5. Krag Klages

    Agree with the other comments. Although there are shards of truth here about MTA not yet being where it should be, this misses the forest for the trees. Daily/Weekly optimizations of creative, tactics, and placements can't always rely on lift testing - that's not scalable - in fact, I'd maintain that in order to appropriately allocate time and resources to lift testing, you should use MTA metrics to inform where you should test. With MTA, relative performance is everything, especially for large established mature brands. I do agree, however, that it shouldn't be your only source of truth. Looking at other KPI's like video complete percentages and executing affinity survey testing on creative should still inform creative and targeting decisions. Is MTA the grand savior that companies have sold executives? No. But can it be incredibly impactful to a businesses bottom-line? Absolutely. Just like anything else, there is never one tool to do everything and I totally agree that MTA has been oversold to marketers as a panacea. But it can be very valuable and worth the investment for growing or established medium-to-large brands.

    Reply
  6. Statistical modeling (which includes MTA) isn't a means of reflecting or deriving absolute cause and effect. If you have some means of deriving true cause and effect, then there's no need for a statistical model. There's a tendency to conflate statistical models (advertising models, economic models, election models) with something like Newtonian physics, where you have a set of rules that very, very accurately and consistently predict where a cannonball will land given a certain mass, trajectory, and velocity. That's not what MTA is, although many marketers would surely like it to be.

    Yes, there can be a tendency to overstate the capabilities of MTA to the detriment of other forms of measurement such as randomized tests with control groups (welcome to ad tech, where magic pixie dust abounds!). However, none of these measurement/attribution tactics are mutually exclusive; a marketer ought to try as many as possible. If you accept that the truth is foggy (sorry, it just is), and you accept that you can realistically do no better than TRIANGULATE on said truth but that you'll never directly grab hold of it, then you ought to be open to any tactic that provides some INCREMENTAL insight (i.e., another boundary point in the ever-shifting, amorphous area wherein the truth appears to lurk).

    Reply
  7. Nico Neumann

    John, thanks for sharing your feedback. You mention MTA helps digital marketers. Do you think you have all purchase-influencing data available in the model - or only the channels the client would like to see working (which is my experience)? We can empirically prove that a model without all critical data tends to fail and confound effects. However, this does not matter anyway. The biggest issue with all correlational models is not whether channel A may be 10% better than channel B, but the actual uplift (what conversions did my campaign drive, that is extra conversions if I did not have channel A). Any MTA strongly inflates results and MTA won't tell you that it would have been better to not run this campaign at all. So MTAs may sometimes work in telling you A is less worse than B (reframing the above example), but don't help at all to give you correct ROIs. No need to trust Google or Facebook, run your own experiments and save the money you spend on attribution tools (use it rather for charity ads for the control group, you will do something good then).

    Reply
  8. Nico Neumann

    Steve, thanks for your comment. I am afraid prediction is the wrong approach in the first place. For insights into ad effects, you need causal inferences. You may have heard the old saying 'correlation does not imply causation.' But this is not (only) an academic exercise. See my answer to John above. Brands should not care about relative performance if the channel has a negative ROI. Companies probably waste millions of dollars because of this widespread thinking (well, real issue is that it is more convenient to us report fake metrics to the boss, they look better - but that's a different story). This has actually been tested by some companies which compared MTA to their experimental data and found that MTA terribly failed (Netflix and IAG Australia). Example for 'credit': 1) Last-touch attribution: 1800 sales, 2) Data-driven (algorithmic) MTA: 1700, 3) Experiment (true uplift): -26. Yes, that's a minus - example: courtesy of Willem Paling (Media and Technology Director, IAG). They switched off the channel, yearly savings of $500K.

    Reply
  9. Put another way, you first need to establish the size of the pie, before dividing up the pie. That is, marketers must measure incrementality using an experiment to determine the causal effects of advertising before marketers can optimize their campaigns. Many vendors offer experimentation platforms that offer turn-key solutions: e.g. Google & Facebook's Conversion Lift, AdRoll X.

    Ultimately, I do think that the second-stage ad optimization problem is too hard for most individual marketers to solve with their limited information of campaign performance. Instead, marketers need to demand from ad platforms that marketers can purchase ads on the basis of Cost per Incremental Action (CPIA), as the platforms are best positioned to optimize on the basis of many advertisers' information.

    Reply

Add a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>