Big tech is all in on differential privacy.
It’s a foundational concept within Google’s Privacy Sandbox; Apple applies it to the study of diagnostic device, health and web browsing data; and, just last week, Facebook used differential privacy to protect a trove of data it made available to researchers analyzing the effect of sharing misinformation on elections.
Uber employs differential privacy to detect statistical trends in its user base without exposing personal information. Amazon’s AI systems tap it to prevent data leakage. Snapchat has used differential privacy to train machine learning models. And Salesforce uses DP filters in its reporting logs.
But while differential privacy began as an academic notion that’s now being adopted by the biggies, ad tech companies need to know about it, too, and some even see it as the future of privacy protection.
What is DP?
Differential privacy was first invented by Microsoft researchers in 2006 as a privacy-safe model for data analysis.
Rather than an algorithm itself, differential privacy is a set of cryptographic properties that can be applied to machine learning algorithms in order to set a limit on how much information can be extracted from data before it’s possible to draw inferences about individuals.
In other words, it introduces “plausible deniability” into a data set, said Aaron Roth, a professor of computer and information science at University of Pennsylvania Engineering and co-author of “The Ethical Algorithm,” a treatise on the science of socially-aware algorithm design.
In practice, that means the data owner purposely adds noise or randomness into a data set so that it’s simultaneously possible to learn something about a population from the data without identifying any of the individuals included in the group.
Consider a pollster gathering statistical information about embarrassing behavior, like drug use or cheating. To protect their privacy, respondents flip a coin before answering without revealing the result to the pollster. If the coin lands on tails, they are asked to respond truthfully. If it’s heads, they flip a second coin and answer “yes” for heads and “no” for tails. This introduces randomness, or plausible deniability, into the eventual outcomes of the study. But because the researcher knows how the errors were introduced, he or she can later work backward to systematically remove them in the aggregate and still glean something useful from the data, Roth explained.
“There is no way for me to know whether an answer is random or not,” he said. “But because I know the process by which noise is added to the response, it’s possible to subtract the noise and learn the average.”
At scale, a machine learning algorithm could apply this principle to make estimates and gather information from a data set without compromising specific individuals. The caveat is that researchers require larger data sets to study in order to make up for the deliberate randomness.
The most common use for differential privacy today is as a way to randomize large data sets so that they can be made available to researchers, such as in the Facebook misinformation example.
“In a sense, it’s about data owners protecting themselves from their partners,” said Zach Edwards, founder of analytics firm Victory Medium. “Differential privacy allows you to give people some access to data in a way that doesn’t reduce the value of your own organization – or create another mini Cambridge Analytica.”
Enter ad tech?
But why should ad tech companies care about an arcane academic concept like differential privacy?
Because it’s the future, Edwards said, whether the ad tech ecosystem wants to admit it or not. Online data collection and sharing will increasingly be controlled by browser APIs whose purpose is to limit online data collection and sharing.
Many of the proposals within Google’s Privacy sandbox are based on a differential privacy framework.
“There’s clearly no more room for workarounds,” Edwards said. “It’s a reality that only big companies really seem to be acknowledging, though.”
In the blog post announcing Chrome’s intention to deprecate third-party cookies by 2022, Justin Schuh, Chrome’s director of engineering, explicitly called out differential privacy as a building block for a future in which ads can be delivered to “large groups of similar people without letting individually identifying data ever leave your browser.”
One of the proposals, dubbed TURTLEDOVE – which stands for “Then Locally-Executed Decision On Victory” – calls for all user behavior, including interest data, to be stored in the browser, which effectively kills profiling. Measurement and reporting would also be at the aggregate level, so a lot less granular than advertisers are accustomed to.
But one fruitful application of differential privacy in the ad tech context could be in the service of data transparency for data clean rooms, said Victor Wong, CEO of Thunder, which developed an industry working group last year called Truth in Measurement that tested different DP models.
The primary criticism of clean rooms is that they’re opaque and operated by the media companies selling the ads, Wong said, which creates an inherent conflict of interest.
“They don’t allow you to do independent attribution [or] measurement to validate, verify or customize,” he said. “Enabling data export with privacy protections afforded by differential privacy allows researchers and analysts to run their own models in their own environments.”
And, in theory at least, there’s no reason an ad tech company couldn’t apply a differential privacy model to its own data analysis. A DSP, for example, could use it to help clients derive aggregate insights about consumer engagement with their ads without revealing info about any one individual, said Alice Lincoln, SVP of data policy and governance at MediaMath.
To the best of Lincoln’s knowledge, though, there aren’t any ad tech companies leveraging differential privacy yet, and awareness of DP and related concepts is fairly low in the industry.
But “interest is growing quickly given the emphasis Chrome has put on [differential privacy] in the Privacy Sandbox,” she said, and MediaMath itself is interested in exploring differential privacy approaches.
Still, Hugo Loriot, US managing director of You & Mr Jones-owned data agency fifty-five, is a lot less sanguine about the prospect of ad tech companies applying differential privacy to their own algorithms anytime soon.
“The very purpose of ID resolution in the ad tech world is to share anonymized personal data with third parties, for example, CRM onboarding for Facebook or Google ads, which requires sharing the exact same methodology between both parties for anonymizing data,” Loriot said.
That’s why ad tech companies all rely on the same hashing algorithms, he said, putting aside the data leakage risks associated with hashing.
“If you apply differential privacy, you precisely do not want another party to be able to decrypt it, and you would have to create 1:1 protocols with every single partner to let them decrypt the data and the noise you add,” Loriot said. “I’m not sure this is really doable at scale.”
No guarantee
By the same token, differential privacy isn’t a guarantee of privacy, and doesn’t create privacy where none previously existed, Roth said. It also can’t necessarily stop privacy violations against groups of people.
For example, the fitness app Strava inadvertently revealed the locations of secret military bases when it released a seemingly benign heat map of popular running routes in 2018. No single person’s privacy was compromised, but it was still pretty damned awkward. Differential privacy wouldn’t help in a situation like that.
The level of privacy protection in an algorithm that uses differential privacy is also dependent on how strict the deployment is.
“You can dial up to perfect privacy, but then you can do almost nothing useful with the data, or you can go in the other direction and have no real protections,” Roth said. “It’s a tradeoff, because privacy protections always come with a cost.”
Still, it’s heartening to see differential privacy finally starting to be applied by the large tech firms to real-world scenarios, he said.
“For the first 10 years, differential privacy was an academic curiosity, and people like me would write papers about it that maybe five other people like me would read,” Roth said. “It’s not a silver bullet, but it’s a very good thing to see companies really starting to think about it.”