While tech from Datalogix and BlueKai (where Tawakol was CEO) largely constitutes Oracle Data Cloud, it also has key cross-device capabilities from Crosswise and publisher audience data from AddThis.
Naturally, Oracle Data Cloud’s vision around “relevant reach” comes from putting these elements together. Datalogix was all about data curation. It was known for its purchase data, but Roza says it has always had great sets around gender and age. But before Oracle bought it, Datalogix didn’t have enough data.
“In the Datalogix era, it was all about CPG, auto and retail,” Roza said. “Now we’ve got travel, financial services, entertainment, B2B, technology, telecommunications. It’s a broad breadth of all the big ad spending verticals.”
BlueKai, prior to the acquisition, was about maximizing scale by partnering with every ad tech player out there.
“BlueKai aggregated a tremendous amount of data and didn’t express a point of view on it,” Roza said. “It was the world’s supermarket of data.” BlueKai certainly had high-quality data sets from companies like MasterCard, but there were some “long-tail brands, and brands that weren’t a value-add.”
Nevertheless, its resounding philosophy was the more data, the better. But this thoughtline has changed since the acquisition, and Oracle Data Cloud wants to weed out some of the less valuable data providers.
“We might not need 200 data providers. Maybe we’re better with 50,” Roza said. “In the Oracle Data Cloud, we have a point of view and curate the crap out of the data, and apply science to make it better.”
Roza spoke with AdExchanger about the vision.
AdExchanger: So what’s “relevant reach?”
ERIC ROZA: There’s a real trade-off when it comes to accuracy and scale, but there’s a sweet spot – and that’s what we call relevant reach.
You might want to reach 25-to-45-year-old females, but you don’t want to reach 25-to-45-year-old females who just bought a brand new sports car if you’re selling Ford F-150s. We’re focused on combining demographics with rich signals in an area, and we’re looking to scale it geographically.
Reconciling reach with precision isn’t a new problem. Why hasn’t it been successfully addressed before?
We can we train our model around really precise – but smaller – data sets, like age and gender. Then apply that to the whole breadth of data.
For example, you might find a cookie tagged with female, female and male by three different demographic providers on BlueKai. What do you do?
Do you just flip a coin? Ignore it and serve it up both ways? Or serve it up as female because you’ve got two females to one male? None of those is the right answer.
You look at how well each of those providers has done historically. And you also look at what else you know about that cookie. If you know that cookie has also gone to automotive enthusiast sites and sites about weightlifting versus sites about shopping for women’s clothes, you can make predictions that are better than the original data set.
What do you need to build on to improve relevant reach?
There’s always [another big data asset] coming down the pipeline. For example, we were talking with Visa for over five years at Datalogix. About a year ago, we broke through with our partnership discussions. It became a really differentiating aspect in the eyes of their merchants. Now, Visa is a great partner with us in the data space.
So the more data assets that are created and liberated, the more value gets added to the ecosystem.
How does all this differ from what you used to do?
Five years ago, we’d match cookies with somebody. We’d have partnerships and pay someone to synchronize our cookies. About three years ago, we needed to verify for ourselves whether these cookies are who they say they are.
We’d bring in third-party truth sets and do triangulation. We’d find that with a certain provider, about 80% of the cookies they gave us are crap.
We also do the same thing with mobile ad IDs, and over the last six months we’ve started rolling out scored mobile ad IDs.
How do you find data partners?
We look for people with new signals to provide. Someone with an audience that’s very different in some way. Volume is great, but if we have 90% of the matches they offer, that’s not interesting to us.
And we also look to see if they have data on our existing IDs that can help train our models. I don’t think anyone else in the market is taking this algorithmic approach to things.
What do you mean by “taking this algorithmic approach to things?” I would have assumed others out there do that as well.
They’re just joining things. They’ll get 3 million IDs from this guy, another million from that guy, then they’ve got 4 million and they’ll put it out there.
That’s what we believe everyone in the market does, other than us.
We get 3 million from this guy, 1 million from that guy, then we put them together to figure out who’s right, when they disagree with each other, which ones we shouldn’t use at all. We score everyone and if there’s a conflict between two sources, which one should we use?
So how did you used to handle that sort of conflict?
We were doing what the rest of the industry is doing: working with sources we believed we trusted, doing our checking, getting our integrations as good as they could be and fulfilling audiences against them.
But we always knew we could do better.