Can Publishers Enable A New Chapter For Modeled Data?

“The Sell Sider” is a column written for the sell side of the digital media community.

Today’s column is written by Alessandro De Zanche, an audience and data strategy consultant.

Today’s modeled data at scale is evolving rapidly, due to legal and technical limitations.

It is frequently misrepresented and mis-sold, but there would be no need for ambiguous sales tactics if modeled data was used as it was intended. Marketers shouldn’t leverage modeled data for targeting, for example, and expect to only engage with a segment of male users, when in reality they are reaching a segment of users with a higher probability of being male than an untargeted audience.

Besides a better-informed sales pitch, is modeled data good data? It depends on several factors, including the type of data and its source, as well as how it is used and implemented in an advertising and marketing context.

Quality modeled data is extremely difficult to produce. I once worked for a market research company that produced several different data products. My team took traditional research assets and products – including consumer panels across dozens of countries – and connected them to digital advertising and marketing ecosystems for activation, such as targeting, personalization and reporting.

The ‘seed’ and the ‘universe’

Imagine a single source panel representing a country’s population with “always-on” capture of deterministic data, generated from each individual panelist’s activity across TV and digital – including demographic and household information, offline behaviors, print media consumption and purchase data. Our mission was to take the deep knowledge and insights provided by that limited, deterministic data source – the “seed” – and create reach through modelling.

High-quality modeled audience data requires a huge universe of cookies, device IDs and their relative data points to be used as a “canvas” that will be enriched with the knowledge being extracted from a deterministic seed. It is relatively “easy” to build or obtain that seed, compared to the more difficult task of gaining access to a high-quality, granular pool of millions of IDs, collected in a consistent, fully compliant and informative way, while also building a relationship with the user based on transparency and value exchange (which exceeds legal requirements).

From a technical perspective, another key aspect is that each cookie or device ID within the “universe” needs to have enough data points attached to it. Moreover, these data points shouldn’t be modeled attributes; modeling using modeled data – rather than raw data – is not best practice. The more granular and raw the data points, the more effective the models, which allow us to build robust lookalike segments with high confidence.

An unworkable problem

AdExchanger Daily

Get our editors’ roundup delivered to your inbox every weekday.

Daily Roundup

Daily News Roundup

You Think You Own That?; Patchwork Privacy, Meet AI

It is becoming increasingly difficult to create massive pools of user IDs with granular log-level data, especially for the middlemen that have no direct contact with audiences, and companies are dropping along the way.

Last Thursday, nonprofit Privacy International filed a complaint with data protection authorities in France, Ireland and the UK against Acxiom, Criteo, Equifax, Experian, Oracle, Quantcast and Tapad “to protect individuals from the mass exploitation of their data in contravention of the General Data Protection Regulation.”

Each of these companies collects data from millions of users. They are not household names, in the sense that 98% of the user base does not know them or have a clue about what data is collected and how it is used; almost none of these companies have any direct relationship with the user. They are increasingly under scrutiny as stricter privacy regulations are implemented across the world.

The UK Information Commissioner has also weighed in on the privacy element of modeled data: “If you’re targeting people on the basis of inferred data, that is personal data. The use of lookalike audiences should be made transparent to individuals.”

A high-quality open ecosystem

With much uncertainty and fewer sources for large data pools, we are left with the big platforms – Facebook, Google and Amazon – which are not immune from privacy and compliance-related controversies.

But these platforms lack either the context (Google and Facebook) or the variety of data needed to create rounded profiles of real people that go beyond buying behaviors (Amazon).

This void creates a unique opportunity and moves the spotlight to publishers, which have a unique combination of granular proprietary data, context and, crucially, a transparent and honest relationship with the individual. What would be missing is scale, which makes an even stronger case for media brands’ alliances.

Media brands have the potential to evolve, not only as a source of quality first-party data but as a sophisticated ecosystem to perform effective data modeling, including becoming a reference for advertisers wishing to enrich their data in a context which maintains transparency, fairness and the user at its core.

Follow Alessandro De Zanche (@fastbreakdgtl) and AdExchanger (@adexchanger) on Twitter.