Is Accurate Attribution Even Possible?

“Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Martin Kihn, research vice president at Gartner.

A few years ago, a pair of researchers from Google and Microsoft named Randall Lewis and Justin Rao circulated a depressing paper called “On The Near Impossibility Of Measuring The Returns To Advertising.” It reappeared last December in a prestigious journal with “Near Impossibility” softened to “Unfavorable Economics” – and it’s still depressing.

You can tell it’s an academic exercise because it refers to advertising as a “choice variable on profit” – that is, an optional investment that is supposed to impact business. Which it is. And which bring us to the near impossibility –er, the unfavorable economics – of any hard-working marketer getting what she thinks she’s getting from her attribution project.

As everybody knows by now, attribution is an analytical method that takes a lot of user-level data and tries to measure the impact of specific tactics on a positive outcome, such as a sale. In its algorithmic form, it is supposed to be an improvement on quaint methods like last-click-takes-all, which are obviously wrong but very convenient. The purpose of attribution is to give fair credit to the tactics – placements, creative ideas, formats – that work.

As Mark Twain said about pertinacity, attribution is easier said than done. Despite what some providers and purveyors proclaim, existing methods are rife with hidden assumptions, shortcuts, data holes, methodological debates and statistical smoke. These challenges inspire vigilance, of course, but are actually significant enough to make this question worth asking: Is accurate attribution even possible?

Lewis and Rao were not assessing attribution per se but rather the deeper question of whether it is possible to determine if a campaign worked at all. In many cases, the answer appears to be no. Why? It’s a mathematical mess.

“As an advertiser,” they conclude, “the data are stacked against you.”

Turns out, there is a lot of volatility in individual behavior anyway – with or without ad exposures – so the volume of data required to show any real impact on this insanity is immense. (Among 25 primarily retail display campaigns studied, the average sale was $7 with a standard deviation more than 10 times higher.)

As Lewis and Rao noted, in such an environment, “a very small amount of endogeneity would severely bias estimates of advertising effectiveness.” In other words, if the data do not include all the important factors that influence sales, the model is wrong.

This brings us to attribution. How complete are the data used in most attribution models? Well, here’s what you need:

All impressions mapped to a unique individual
Every individual identified across all their devices
All impressions in view, viewed and legitimate
Target variables, such as sales, mapped to a unique individual
All relevant factors measured accurately

There has probably never been an attribution project conducted in such a perfect world. Our reality is not a lab. And perfection has never been as important in advertising as it is in some more serious fields like, say, kangaroo studies. But just for a moment, let’s play a thought experiment, inspired by Joydip Das at Krux.

Imagine a campaign that targets 1 million people and results in 1,000 conversions. There are an average of 40 exposures of some kind per person, and each exposure has 40 pieces of information associated with it, such as size or which creative version was used. Just this one campaign could lead to a total number of combinations equal to all of the stars in the sky. One admires the problem.

Back here on Earth, there is also the old peril of correlation vs. causation. Unlike Lewis and Rao’s research, most attribution projects in the wild are not randomized controlled tests. They attempt to impute causation from observations after the fact. Not only is this approach less rigorous, it can also drive a very long way, down a very dark road, if the model just happens to be missing something.

Like what? How about little things – any offline media, inaccurate targeting, competitive moves, forces of nature like stock market crashes and – well – Facebook. That first point is not insignificant. Maggie Merklin at Analytic Partners estimated thatroughly 5% of media spending by the top 200 US advertisers is digital from end to end, meaning user-level data through to the sale. This is the reason that most big ad spenders combine attribution with marketing mix modeling, which has been used for decades.

Apart from the real data challenge, here is another inconvenient truth: There is no standard analytical approach to attribution. I’m sorry; there isn’t. There are at least a dozen methods with their advocates. It’s not so troubling in a consulting situation, perhaps, but there are vendors who have literally coded a philosophy into their product. One may advocate [PDF] a form of logistic regression that another calls “common modeling,” inferior to post-hoc A/B tests. Another argues [PDF] for a “survival model,” based on medical research, while someone else makes a compelling case [PDF] for Markov Chains or something called FTRL [PDF].

And we haven’t even mentioned game theory yet.

So where are we? Is attribution simply a programmatic Xanadu, some endless vision quest? Of course not. Useful results emerge all the time from thoughtful attribution projects, particularly when they are combined with complementary methods, which may include randomized tests and mix modeling. My point here is just that it is big science and requires expert attention.

Lewis and Rao did not conclude that measuring the returns to advertising is impossible. Nor is accurate (enough) attribution. Both require statistical scrutiny and more data than you might think. There is another implication identified by the researchers. The new economic calculus favors very large publishers “because they can carve out a monopoly on reliable feedback.”

In plain English: Facebook and Google look likely to win – again.

Follow Martin Kihn (@martykihn), Gartner (@Gartner_inc) and AdExchanger (@adexchanger) on Twitter.

Tagged in: