Publishers And The Hidden Costs Of Data Leakage

Tom Chavez is an entrepreneur, technologist, musician, and family man residing in San Francisco. He was the founder and CEO of Rapt Inc., and following the acquisition of Rapt by Microsoft, he served as General Manager of Microsoft Advertising’s Online Publisher Business Group.

I have received an overwhelming amount of feedback from my last missive. I heard from friends and colleagues in the publisher space, DSPs, agencies, and technology and service providers. There were hugs, high fives, bricks, and a few rotten tomatoes – all of it instructive. Ultimately I’m just glad to see some really smart people thinking about the challenges publishers face in an increasingly buyer-centric industry.

In my last note, I asserted that:

With the right data infrastructure in place, publishers can open up regulation-proof revenue streams worth hundreds of millions of dollars. It would be a shame if some new channel master strip-mined their audience of its emerging data value without giving them their due.

Before publishers can start claiming value from data, they need to get a handle on what they’re losing. Revenue Exposure is a metric that I have been exploring with my friend and colleague Vivek Vaidya, formerly CTO at Rapt, and Andy Skrzypacz, Professor of Economics at Stanford. Our goal in this exercise is about more than intellectual stimulation; we’re dead-set on measuring the potential costs to digital media publishers of data collection across their websites.

At the core of Revenue Exposure is the notion of ‘missed’ demand. By that I mean the downstream impact of others collecting data about a publisher’s users today, repackaging that data, combining it with tonnage media, and then using it to sell against that publisher in the future. In essence, new sellers of like-kind assets emerge who shave points off the share-of-spend captured by the publisher, all of it energized by the publisher’s own data.

Of course, industry practitioners already know that this isn’t a conceptual scenario. It is the hinge point to ad network arbitrage economics, oxygen for DSPs, and critical fuel for audience buying and remarketing/re-targeting for advertisers across the board.

I do not question the targeting tactic itself; it’s exactly the right way to leverage data to improve campaign performance and customize advertising, content, and commerce experiences for the end user. What concerns me and a growing number of folks within the digital media community is that much of this activity is not occurring in the plain light of day. Of course that raises serious privacy questions, and it’s increasingly urgent that our industry address them. But at least as importantly, the data owners – usually the publishers, and perhaps the end consumer – may not be getting fairly compensated for the use of that data downstream.

Insurers use sophisticated actuarial measures to manage risk and adjust policy-setting and selling practices. High-tech companies carefully measure book-to-bill and channel sell-through prices to ensure the long-term value of their chips and devices. It’s time for publishers to bring similar discipline to their own businesses.

Revenue Exposure is, in essence, an indicator of the publisher’s opportunity cost from data leakage or unauthorized data collection. A high-tech company might reduce price of a component through a reseller channel to spur more short-term revenue, but that introduces the long-term risk of accelerated price erosion. Similarly, a publisher can allow more data collection on his website today, but with potentially serious downstream consequences. To understand them, we need to first briefly review the economics of the publishing business.

Publishers have three assets. The first is content, whether they create it, acquire it from others, or invite their audience to contribute it. Ultimately, they create, collect, curate, and present content that is of interest to a particular audience. Second, they have environment, representing the essential and unique qualities of the manner in which the content is presented. Consider the perceived differences between a glossy fashion magazine and a supermarket tabloid. They might both run a cover story about the same celebrity, but the quality of presentation and the inherent value imparted by the publications’ brands are different. These differences shape the relationships those two publishers develop with both their readers and their advertisers.

Finally, a publisher has the all-important asset, audience. Given the quality and usefulness of the information on offer (content) and the manner in which that content is presented (environment), a publisher aims to grow a large and loyal audience. When an advertiser purchases ad space from the publisher, it seeks to reach that audience with a particular message. Further, the advertiser may even be willing to pay a premium for placements based on the quality of the environment. The publication’s brand, history, and production quality all lend confidence to the advertiser that its message will not only reach the desired audience, but will also be presented to the audience in a pleasing way.

This is the fundamental economic model underpinning all ad-supported publishing, online and offline. In exchange for useful content and a satisfying experience, users consume advertising. To reach the right audience, advertisers pay for ads, thereby underwriting the publisher’s operating costs required to create, collect, curate, and present content.

Good Data, Cheap Media

Technological advancements and new business models disrupt that fundamental economic balance, jeopardizing the equitable value exchange between publisher, audience, and advertiser. Through the use of cookies and tracking pixels, ad networks, DSPs, and individual advertisers can now identify and track specific users across the internet. Often, they use those tracking techniques to gain ‘backdoor’ access to valuable user information without the publisher’s knowledge or authorization.

Whenever a third or fourth party backdoors a pixel onto a publisher’s site, they can collect data about that site’s audience. That data is used to build valuable audience segments by combing it with cheaper media from other websites, bypassing the original publisher altogether. This “Good Data + Cheap Media” strategy poses at least two immediate economic problems for the publisher.

First, it degrades the value of the publisher’s media, as the third party collecting the data effectively strips it of its audience value. By unbundling the publisher’s value proposition – audience + content + environment – and leaving the publisher with just content + environment, the third party effectively degrades net price for the publisher’s media. By way of example, if consumers can get an equally good cup of coffee at a nearby fast food joint as at Starbucks for less than half the price, their willingness to pay a premium for Starbucks is eroded. If coffee were like digital media, the fast food joint would also be pilfering Starbuck’s beans to make its half-price coffee.

Second, the data can be used many times over to deliver an audience of interest to advertisers. Problem is, it’s acquired for free. The publisher creates the content that attracted the audience in the first place, but they receive no compensation from the third party that monetizes it over and over again with every impression, every offer, every click. In the realm of music, when a songwriter writes a song, he or she earns royalties every time that song is performed. If songwriting were like digital publishing, the songwriter would write a great song and watch someone else perform it in stadiums, on radio, and on TV, over and over again, without earning a penny from the effort.

Publishers who leak data leak money. The indirect loss may appear small on a pixel-by-pixel, cookie-by-cookie basis, but it accumulates into considerable sums as the scale and scope of third party data collection grows. Revenue Exposure puts the numbers to the “Good Data + Cheap Media” dynamic illustrated above and ties it back to specific data collection activities across the channels and sections of a publisher’s website.

Digital media publishers need always-on, real-time visibility regarding the revenue risk resulting from data leakage on their sites. Ultimately, it’s up to them to determine the level of risk they want to carry in their own operations as a function of the data collection they allow or prevent. Subject to whatever privacy regime emerges in the months ahead, publishers should, in my view, have the ability to make these decisions on their own. But they’d be ill advised to make them in the absence of a more quantitative baseline to help measure and manage the resulting exposure.

Tagged in: