“There’s been a realization in the industry that things we’ve done in the past, like aggregation, might no longer be sufficient for protecting privacy,” Rogers said.
Not that differential privacy – or any PET, for that matter – can achieve perfection. Perfect privacy is only possible if no data is shared at all, and if nothing is shared, there is no utility.
“Think of differential privacy as existing on a spectrum,” Rogers said.
In other words, there has to be a tradeoff.
Adding more statistical noise or randomness to a data set means the privacy guarantee is stronger, but the output will likely be less accurate, and vice versa. The ratio depends on your risk tolerance level, what you’re trying to achieve and the sensitivity of the data set in question.
PEDAL to the privacy
LinkedIn’s investment in differential privacy for post analytics was about being proactive rather than reactive to risk.
One logical way to protect the privacy of someone who views a post is to only share aggregated information with the post’s author, like the top job title among viewers or a company name.
But LinkedIn’s applied research team wondered whether it would be possible for a bad-acting author to combine that information and monitor real-time updates to profiles on LinkedIn as a way to identify exactly who engaged with a post.
Although LinkedIn had never seen an attack like that happen in the wild, the team, helmed by Rogers at the time, decided to dig in and find out whether a risk really existed.
And, apparently, it did. They discovered it was technically possible to identify around 9% of post viewers using a small amount of demographic information.
The upshot of this research was the development and release of a privacy tool late last year called PEDAL, which stands for Privacy-Enhanced Data Analytics Layer.
If you’re a data scientist or some other variety of math-minded brainiac, you can dive into the details here. But I’m neither, so, in short, what PEDAL does is to apply multiple differential privacy algorithms to inject noise into event-level data before it’s shared with LinkedIn’s analytics platform.
The upshot is that the people viewing LinkedIn posts can’t be identified – but the person posting them can still get useful analytics instantly. Balance = struck.
“With differential privacy, you can still get useful insights from data without revealing anything at the individual level,” Rogers said. “The point here is to be as practical as possible.”
🙏 Thanks for reading (wherever you happen to be doing so; this is a judgment-free zone)! As always, feel free to drop me a line at [email protected] with any comments or feedback.
|