“Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.
Today’s column is written by Marcus D. Collins, scientist at Placed.
Location is a big part of mobile advertising, which is why advertisers pay more when an ad request includes the user’s location.
But how accurate is this information?
It turns out that the location data from ad exchange bid requests is far from ideal when trying to measure real-world shopping. After analyzing billions of impressions per day from top ad exchanges, I found that less than 1% of real-world visits to businesses are captured in the location data from ad exchanges.
Scale Vs. Quality
With access to proprietary data that represents 1,000 first-party locations per user per day, across one in 450 adults in the US as ground truth, I have a unique opportunity to validate the accuracy of location data coming from ad exchange bid requests. I analyzed three weeks’ worth of bid requests from leading ad exchanges, about 4 billion impressions per day across some 50 million devices.
Four billion impressions per day may sound like a wealth of data, but I found that only a fraction of that actually corresponds to real-world human locations and behavior.
Only 27% of those daily bid requests contained location data at all. Among the requests that did contain location data, 12% weren’t precise enough to tell whether they came from a particular store or business. For example, consider the difference between saying, “I’m downtown” vs. “I’m at Fifth and Main St.” Only 12% of locations could do the latter.
After filtering out imprecise locations, 90% of remaining locations were “overrepresented” and couldn’t be trusted. For instance, a single location in San Jose, Calif., accounted for 10% of all bid requests in the San Francisco Bay Area. That many people in one specific place is not plausible. Since there’s no way to tell which of those locations are from “real” people, I discarded impressions from such locations.
Ten percent of the remaining locations were either too random or not random enough to come from humans (see the map below). Bots accounted for roughly 67% of remaining locations, generating up to several thousand impressions per hour for individual users.
After all these types of low-quality impressions are filtered out, 30 million reliable locations per day remain, from 5 million total devices, each providing about 30 locations per day for six days per month. For this analysis these impressions will be considered high-accuracy bid request locations.
Some publishers report nearly random locations, resulting in the pattern of background squares separated by 0.1°-wide blank stripes, while others report high-quality locations (revealing the major urban areas of Florida).
From Numbers To Insight
Thirty million locations each day is still a significant amount of data. Could this data lead to quantitative insight into real-world consumer behavior?
To find out, I compared the high-accuracy location data from ad exchanges to data against the ground truth data set. This audience provides a high-frequency, persistent stream of location data straight from users’ devices, with no middleman, eliminating the challenges seen with exchange-based data sources. When comparing this audience data to the data from ad exchanges, two key points emerge:
The low frequency of location data from exchanges makes it very difficult to differentiate between someone going into a store vs. someone just walking by. This results in unreliable or misleading inference of whether someone visited a particular business.
The overwhelming majority of real-world business visits don’t generate high-quality locations on the exchanges.
Real-World Examples
Circles represent ad impressions; the large symbol indicates a visit inferred from persistent, high-frequency location measurement. Visualizations altered to ensure users’ privacy.
In the picture above, the persistent location measurements show a several hour visit (origin marker) at a bowling alley. But all of the ad impressions with high-accuracy location (red circles) happened within one to two minutes, and show the user moving away from the site. One might infer a visit to the bowling alley based on those requests, but given the limitations of urban GPS, and the short window in which the requests occurred, one could just as easily infer the user was only passing by.
With only a handful of user locations – typical from ad exchanges – rather than a persistent stream, it’s difficult to draw conclusions about what users were doing at those locations.
Here, persistent location measurements show a user walking past a coffee shop to a local market. The user received ads at two locations while passing the coffee shop, while the actual visit to the market generated no ad requests with high-quality location. Throughout the analysis, as in this example, there was very little overlap between ad impressions and actual visits, indicating that most impressions are generated in transit – between visits.
The takeaway? Even when a user’s location is accurate, more context is needed to understand what the user was doing while at that particular position. Even high-quality location data can lead to false conclusions when taken out of context. Based on this analysis, I recommend asking location analytics providers how they handle these types of scenarios and how it affects their final results.
Additional Findings
Additionally, the analysis revealed that approximately 73% of bid requests occur in transit between two locations. This makes sense: When you’re shopping, ordering lunch or engaged with a business, you aren’t on your phone. In order for the true location to show up on an ad exchange, you’d have to unlock your phone, open an app with location permissions and have that app generate a bid request.
Ninety-one percent of business visits observed in the high-frequency location data generated no ad requests at all. For the remaining 9% of visits, not all of the reported locations were accurate. When I compared the location of a particular ad impression, as reported by the ad exchange, to that user’s location, as shown from the persistent audience data, those locations can differ significantly. Less than 1% of business visits generated an ad request location within 100 meters of the user’s true location.
The Bottom Line
Less than 1% of the location data from ad exchanges is accurate enough to help marketers understand people’s movements in the real world. Even then, most high-quality data doesn’t actually correspond to business visits.
Inferring business visits by analyzing ad requests is extremely challenging. If your business relies on ad location data, you can increase its usefulness by understanding those challenges and learning more about how providers handle them.
Follow Placed (@Placed) and AdExchanger (@adexchanger) on Twitter.