Home Data-Driven Thinking Deterministic Data Isn’t What It Seems

Deterministic Data Isn’t What It Seems

SHARE:

Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Matt Keiser, founder and CEO at LiveIntent.

Metadata is necessary to wield identity with the same accuracy as the triopoly – Facebook, Google and Amazon. The triopoly has great metadata because it gives data a day job.

The triopoly knows that human beings are more than just one identifier. Our identities can be thought of as snowflakes – composed of the cookies, anonymized PII and mobile IDs tied to people at their core.

For example, I have multiple email addresses from school, work, broadband providers and more. I no longer use [email protected] to register for anything, but since it is persistent, it can still be used to target me if linked to my current emails, cookies or mobile IDs. In my personal identity snowflake, [email protected] and my work email are linked to many of the same cookies and mobile IDs because I log in to check email and use them for registering for different websites and apps, based on whether I use them personally or for work.

These multiple email addresses explain part of the reason marketers are sometimes disappointed in CRM onboarding results outside of the triopoly. Marketers often onboard to target a user based on identity, starting with an email hash that’s converted that to a cookie. But marketers measure attribution (and therefore success) by going from a cookie back to an email hash or comparing who registered against who was targeted.

When you follow the data, it becomes obvious that you need to look at clusters of data, not rows, to understand identity, since both my personal and work email addresses are true. This is how the triopoly connects the dots on identity.

Metadata: Key to building snowflakes

People-based marketing data models must go beyond a single cookie-to-hash pair or cookie-to-mobile ID pair and chart identity snowflakes, just like Facebook, Amazon and Google. The odds of a targeted cookie showing up at conversion are small. But an identity snowflake model alone can’t maximize performance because it doesn’t provide the signal needed to adjust the probability of driving the targeted event.

The triopoly model includes metadata about the different relationships within the identity snowflake to determine the strength between devices, browsers, apps and the person.

A deterministic cookie-to-person pairing that’s seen only once isn’t the same as one seen every day. Marketers need a way to score and differentiate the quality of connections in a graph, though the accuracy of the pair isn’t what truly matters: Outcomes matter.

Subscribe

AdExchanger Daily

Get our editors’ roundup delivered to your inbox every weekday.

If marketers want to drive conversions for $100 via retargeting, they could use identity snowflake metadata to drive results by modifying bids. They may bid 10 times the value on a cookie that logs in regularly from a pair purchased from a third party that’s only been seen once. As long as they’re right and the cookies convert 10 times as often, they’ve right-priced each bid and increased their likelihood to drive conversions.

It gets more complicated. My identity may be mapped to a dozen cookies across multiple devices, so only looking at IDs associated with one of my devices would create an inaccurate view of the customer journey because it will have gaps. But looking at all the devices mapped to my identity would cause expansion that’s equally inaccurate.

If targeting me yielded a conversion, does it matter which ID converted? Probably not, unless the goal of the campaign was to reach a specific device.

Using metadata properly allows marketers to scale up and down for accuracy and performance, based on what works for them. Cross-channel and cross-environment (on the same device) require that marketers tune their data model based on what drives results for them.

A one-size-fits-all graph does not give a marketer or marketing platform the same control that the triopoly asserts.

Deterministic and the ‘truth’ 

The concept of metadata signals changes how we think about deterministic and probabilistic data. Deterministic data is described as “truth,” and probabilistic data is considered the output of triangulation methods used when “truth” is not available.

A snowflake model like the triopoly’s will show a variety of pairings that, while formed deterministically, are hardly the truth to a marketer that wants to reach me. However, the creation of metadata introduces an additional type of critical – and distinct — probabilistic data to the triopoly’s identity calculus. This “new” probabilistic data underlying the metadata highlights which pairs are most accurate and best for direct targeting, as well as which are less accurate and usable for expansion or lookalikes.

While the identity snowflake concept undermines the orthodoxy of deterministic data as “the truth,” metadata offsets the efficacy loss driven by weak deterministic pairings within the identity snowflake. The best way to think about deterministic data within an identity graph is that its efficacy depends on the quality of this new form of probabilistic data that underlies metadata.

Metadata is a triopoly differentiator

Metadata allows the triopoly to wield identity with tremendous accuracy. Generating metadata requires visibility of the micro-events that build and refresh the basic data mappings within an identity graph.

By expanding the definition of an identity graph to include metadata, we also expand the definition of “signal.” Signal is defined as purchase behavior used for retargeting. In a world of people-based marketing and identity graphs, signal also includes “unpasteurized” information about which email addresses have opened emails or logged in, on which device and how frequently. This signal is the root of metadata. Facebook, Google and Amazon are potent collectors of metadata signal because their core businesses drive email traffic and user logins.

Targeting and measurement via snowflakes

Marketers need tools to prove attribution and measurement that tie together different environments on the same device or across devices and channels. Traditional attribution models outside of the walled gardens haven’t supported the complexity required to weave together attribution in an identity snowflake world.

Matching a targeted cookie with a cookie that converted doesn’t prove incrementality or illuminate the customer journey. Identity snowflake attribution requires “reversing” the flow of data against an identity graph, which is the opposite flow of traditional onboarding – going from online data back to durable identifiers – or outboarding.

The triopoly is excellent at outboarding: It’s how it outperforms and claims the lion’s share of attribution. Outboarding is also why the Salesforce integration with Google 360 is potentially so transformative – it will allow brands to map first-party data to a previously walled-off portion of the identity snowflake, drastically increasing their understanding of what’s happening behind the Google wall. The integration is the cloud’s first truly differentiated news at the intersection of marketing and advertising in some time.

The triopoly knows that all deterministic data isn’t created equal, and the customer journey – unique to each buyer – can’t be explained with a basic data mapping model. To copy the triopoly’s success, living and breathing metadata at scale is needed. Only a few players have it, and all the tech and co-ops in the world can’t overcome sparse metadata.

Follow Matt Keiser (@mrkeiser), LiveIntent (@LiveIntent) and AdExchanger (@adexchanger) on Twitter.

Must Read

LiveRamp Outperforms On Earnings And Lays Out Its Data Network Ambitions

LiveRamp reported an unexpected boost to Q3 revenue, from $160 million last year to $185 million in 2024, during its quarterly call with investors on Wednesday.

Google in the antitrust crosshairs (Law concept. Single line draw design. Full length animation illustration. High quality 4k footage)

Google And The DOJ Recap Their Cases In The Countdown To Closing Arguments

If you’re trying to read more than 1,000 pages of legal documents about the US v. Google ad tech antitrust case on Election Day, you’ve come to the right place.

NYT’s Ad And Subscription Revenue Surge As WaPo Flails

While WaPo recently lost 250,000 subscribers due to concerns over its journalistic independence, NYT added 260,000 subscriptions in Q3 thanks largely to the popularity of its non-news offerings.

Privacy! Commerce! Connected TV! Read all about it. Subscribe to AdExchanger Newsletters
Mark Proulx, global director of media quality & responsibility, Kenvue

How Kenvue Avoided $3 Million In Wasted Media Spend

Stop thinking about brand safety verification as “insurance” – a way to avoid undesirable content – and start thinking about it as an opportunity to build positive brand associations, says Kenvue’s Mark Proulx.

Comic: Lunch Is Searched

Based On Its Q3 Earnings, Maybe AIphabet Should Just Change Its Name To AI-phabet

Google hit some impressive revenue benchmarks in Q3. But investors seemed to only have eyes for AI.

Reddit’s Ads Biz Exploded In Q3, Albeit From A Small Base

Ad revenue grew 56% YOY even without some of Reddit’s shiny new ad products, including generative AI creative tools and in-comment ads, being fully integrated into its platform.