Black Holes In Our Data Models Are Quietly Getting Bigger

Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media. 

Today’s column is written by Nathan Woodman, an independent consultant and former chief data officer at Havas Media Group.

We don’t talk about it much, but there is a growing bias in the data that feeds our digital marketing algos and measurement.

This emerging instability in the data assets that power our industry are amplified by browser tracking changes, government regulation, user consent frameworks and clean room ID redaction. This challenge presents an opportunity for the industry to fix pervasive but previously unaddressed measurement and optimization issues.

Many of us have been focused on viewability, fraud, unfair auctions and supply-path optimization. These issues took priority, but now that they appear to be settling, and new friction is emerging, I feel the time is right to address some fundamental issues in measurement that make marketers uneasy about their digital investments and create a shroud over the industry that keeps ad dollars away.

Shining light on the problem is the first step in correcting the behavior.

Unbiased data

The 20,000 panelists in the Nielsen TV-rating survey are limited in size, but they are thoroughly vetted to ensure that the panelists are a balanced representation of the US population. The universe of the Nielsen sample is an example of a methodological gold standard of unbiased measurement that is trusted by marketers.

Ad tech data scientists seem to rarely consider sample bias in the digital data assets of their platforms or walled gardens. They are tasked with optimizing the attributed cost per KPI as measured in their bespoke universe of data.

These are two different approaches to measurement and optimization that are separated by their vantage point.

The Nielsen approach assumes that learnings from its research are deployed to US households and relative to the household. It is a household-centric view.

The ad tech platform and walled garden approach assumes that the learnings are deployed in the same platform from which the data was extracted and are relative to the platform. It is a platform-centric view.

Both of these approaches are technically unbiased in the opinion of the researcher or data scientist. The distinction is irrelevant as long as the universe of data in the digital platform or walled garden is large enough that it is representative of the population. However, the sand is shifting under the foundation of the digital platforms, and I believe the assumption that the digital data is unbiased can no longer be accepted.

Biased data

Bias created by ad blocking is a harbinger of what is to come from browser-level ID deletion, such as Safari’s Intelligent Tracking Prevention (ITP), and government regulated opt-in, which  is required in the California Consumer Protection Act (CCPA).

The underlying data that feeds digital machine learning algorithms or heuristic targeting decisions have always been biased to a degree, but the problem has become acute since the rise of ad blocking in 2013.

The number of monthly active ad-blocking users, as estimated by PageFair:

The acceleration in ad blocking has skewed the digital sample used in ad tech data scientists' training sets, making the digital platform universe of data less representative of the US population. ITP, similar restrictions in Chrome and privacy regulation will further skew the data because some browsers will delete IDs more frequently than others, and some users will opt in to tracking consent and others will not.

Because some patterns of users are blocked or opt out of tracking before they are ever a member of the universe, their patterns, attributes and features are never seen by targeting or optimization models. Much like a black hole is only detected by the absence of light, the missing patterns are only detected when observed from an external vantage point.

Signs of bias or black holes

Research shows that 31% of Americans use ad blockers mostly on their laptop and desktop. Men are 10 percentage points more likely than women to use ad blockers, and 18- to 34-year-olds are 10 percentage points more likely than all age categories to employ ad blockers.

ITP is creating a similar black hole for Apple’s Safari browser. Conversion rates for Apple browsers and devices have fallen since the introduction of ITP as Safari IDs’ half-life shortened. Overall CPAs are relatively unchanged as algorithms and media planners adjusted to the black hole, but this is unsustainable as the problem expands.

Safari browsers represent 15% of US usage, compared to 64% for Chrome. The average salary for an iPhone user is $53,251, more than 40% higher than the $37,040 average salary of Android users.

Their tastes in TV also diverge: Android users are fans of “NCIS,” “Law & Order” and “Saturday Night Live,” while iPhone users watch “Game of Thrones,” “Grey’s Anatomy,” “Friends” and “The Walking Dead.”

Pending government regulation will likely amplify the impact of ad blocking as proposed user consent frameworks add friction to persistent tracking and, by default, more sub-pockets of the population appear as black holes and increase sample bias.

The result is that digital marketing campaigns that are executed by digital platforms and designed to drive performance against measured CPA or target specific addressable audiences will not reach their targets.

These sub-pockets are less likely to be tagged with a persistent identifier, resulting in their underrepresentation in the sample, which will make it difficult for their distinct patterns to be identified by analysts or machines.

Mobile app data is not immune to the trend. The issue is not isolated to universally unique identifiers (UUIDs) in browsers – it will also likely skew mobile advertising ID (MAID) data. Some industry observers believe that Apple and Google will make similar moves that reset IDFA (Identifier for Advertisers)  and AdID in mobile apps, respectively, and MAIDs could become as unstable as UUIDs in browsers. If they don’t, then the heavy hand of government regulation surely will cover mobile devices and mobility tracking. MAIDs are currently better than UUIDS, but they are not future-proof.

Identity resolution does not solve the problem ... yet

Identity resolution and onboarding providers that match identity to first-party or third-party digital IDs do not fix the problem today. These solutions rely on matching identity to digital IDs during log-in events, and digital IDs’ shortening half-lives require the log-in event to be verified more frequently to maintain persistent tracking.

Identity resolution needs to be coupled with a persistent log-in to make identity tracking more stable. Combined, they can go a long way to fix user-consented persistent identity that would assist in mitigating sample bias and other issues. They are just not ready yet.

There are many unknowns about how browser restrictions, government regulation, user consent, persistent log-ins and clean rooms will play out. However, what is clear is that due to the upheaval we have an opportunity to retool our measurement and optimization approaches. In that process we can correct where we have gone wrong and move the industry closer to a measurement gold standard that can be trusted by marketers.

Follow AdExchanger (@adexchanger) on Twitter.

1 Comment

  1. Excellent article, and very much on point! Thank you for shining a light on this very important topic.


Add a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>