When Evaluating Cross-Device Graph Technology, Look Beyond Match Accuracy

RajivMaheshwariData-Driven Thinking" is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Rajiv Maheshwari, cross-device technology leader at Neustar.

With consumers increasingly accessing content and shopping via multiple devices, multiscreen and cross-device identity have become critical to advertisers.

It offers a unified view of individual consumers as they interact with brands’ advertising across multiple devices and platforms. A unified view of the consumer opens the door to cross-device marketing, multitouch attribution, closed-loop reporting, unique reach and frequency measurement and opt-out compliance, among other desirable capabilities.

Industry giants with large user bases across mobile devices and desktops, such as Apple, Google and Facebook, have a clear advantage because users identify themselves by logging in via a single ID across all platforms. These companies have created so-called “walled gardens” offering deterministic identities at scale, potentially grabbing the lion’s share of advertisers’ spending.

Other companies in the ad tech ecosystem need a formidable alternative solution to compete. Several vendors have emerged over the last few years to fill the void with probabilistic cross-device matching technology that links browser cookies and device IDs to each user.

The vendors’ aggressive marketing has largely driven the conversation toward match accuracy of clustered identities in their “device graph.” Vendors have claimed accuracy ranging from 70% to 97%. But what they are really talking about is precision, incorrectly defined as accuracy.

I’ve had to evaluate several device graph technologies over the last year. I’ve found that, in general, the level of sophistication in currently available solutions on the market is still low compared to other successful applications of machine learning technologies, such as email spam filtering, recommendation engines, face recognition or fraud detection. Here are some of the criteria I’ve learned to consider.

Precision And Recall

Precision is the percentage of clustered identities in the device graph that are truly linked to the same individual. Recall, on the other hand, is the percentage of all existing user identities that are clustered in the device graph.

For example, say a given user has five different IDs across multiple browsers and devices, which I’ll call A, B, C, D and E. If IDs A, B and F – some other user’s ID – are clustered in the device graph, the device graph’s precision is 67%, since two of the three clustered IDs are correct. However, the recall is only 40% since only two of the IDs are correctly clustered out of five total IDs. Lower precision can yield higher false positives, while lower recall indicates higher false negatives.

Depending upon your target use cases, you may prefer higher precision to recall or vice versa. For example, higher precision is desirable if marketers want to retarget an audience with sequential messaging. Higher recall is desirable if the goal is to increase audience reach by acquiring new screens. Some vendors may also provide the ability to adjust precision vs. recall via a cluster affinity score for IDs. Increasing recall typically also increases scale.

Partial vs. Fully Clustered Cross-Device Identities

Ideally, each cluster in the device graph should have all the IDs linked to the same individual. From our previous example, IDs A, B, C, D, E and F would constitute a full cluster. However, the vendor’s device graph may provide only pairwise or partial clusters with IDs spread across multiple clusters as illustrated by the following ID tuples:

  • [A, B, F]
  • [B, C]
  • [D, E]
  • [B, E]

Many of the cross-device business use cases, such as multitouch attribution, depend on accurately assembling the entire user events chain. It is a lot simpler to stitch together users’ journeys across multiscreen touch points with fully clustered cross-device identities. Assembling user events chains from partially clustered identities is computationally intensive when dealing with billions of user events. Partial identity clusters are only a partial solution to the cross-device identity problem.

Scale Of Clustered Cross-Device Identities

Vendors often tout that they have more than a billion IDs in their device graph. However, what matters from a cross-device perspective is how many of those IDs are clustered. Probabilistic and deterministic matching can only tell which IDs are linked to same individual.

Standalone ID does not necessarily imply that the corresponding user has only one digital ID in the universe. In that respect, standalone IDs in device graph are about as important as IDs that are not in the device graph, meaning they provide no additionally useful information. So although a vendor may have more IDs in its device graph, many may be useless.

Individual And Household-Level Hierarchical Clustering

Finally, if your marketing goals require both individual and household-level granularity, it may be a good idea to ask your vendor if they can support hierarchical clustering of individual-level cross-device identity clusters into households. There are several algorithms that perform hierarchical clustering. Redesigning the algorithms to perform at big data scale is difficult but certainly achievable with currently available technologies.

Untitled

I find it encouraging to see growing interest and momentum in cross-device identity solutions. With several readily available machine learning software libraries and tools, the entry barrier is set fairly low.

On the flip side, there are significant data science and engineering challenges to overcome. A comprehensive solution would also need access to online, mobile and offline identity data points. Hopefully, increased competition will drive innovation in this space.

Follow Neustar (@Neustar) and AdExchanger (@adexchanger) on Twitter.

1 Comment

  1. Great job of demystifying the cross-device labyrinth! What's your prediction for when hierarchical clustering will become open, achievable, and affordable for the early majority of marketers? 2 years? 5 years?

    Reply

Add a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>