Big Data Without Your Own Data Is A Big Mess

“Data Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Matthew Keylock, global head of data at dunnhumby.

Everywhere I go, I overhear conversations about data science, big data, algorithms, the cloud, machine learning hackathons, Hadoop…the list goes on.

But these buzzwords are often thrown around with very little understanding of what they mean and how they all fit together. The marketing world is drowning in “big data” – or at least big data hype and confusion.

It seems that most of the industry focus and investment is going into technology to manage the data, or the “widgets” applied to the data without much forethought or expertise focused on the quality, scalability and sustainability of the data assets themselves. But the best technology is wasted if the data itself isn’t up to scratch.

It’s the equivalent of accumulating world-class mineral mining equipment and engineering skills, building impressive processing, refining and distribution capability, only to realize what you thought were diamond-rich rocks were in fact just rocks. Bottom line: Raw materials matter.

Show Me The Granularity

Many marketers are indiscriminate about delivering the goods. As long as it looks like they are doing the right things, have the shiny tools, speak the right jargon and work with the buzzworthy digital companies, then they can seem capable. They may even be promoted or win an award.

Clearly not everyone is baffled though. Sir Martin Sorrell, commenting on other media companies’ approach to data, sniffed: “WPP has a real big-data business where we own the data. We have access to the data. We don’t buy third-party data.”

Sorrell knows that competitive advantage in our increasingly personalized, digital world won’t come from using the same bland data as everyone else. Thankfully, the more open world of data is also showing us that yesterday’s approaches are no longer good enough. You need only to look at your own profile on some data-aggregator portals (such as Axciom’s aboutthedata.com) to realize this. The data may not be dramatically wrong (although friends and I found that some very important attributes in our profiles were quite error-ridden), but it is the overwhelming dullness of the data that is shocking (if that isn’t oxymoronic).

If data were food, this information would be lumpy, cold, flavorless oatmeal. Unfortunately, you may find that your company is using this data as a proxy for you and me, and building strategies, plans and executions based on it. This, in 2013, is a tragedy.

AdExchanger Daily

Get our editors’ roundup delivered to your inbox every weekday.

Daily Roundup

Daily News Roundup

Let the OpenAI Ads Tests Begin; Why Everything Is Annoying Now

Binary attributes of whether I have purchased or not purchased in a broad category or whether I am interested or not interested in a topic provide none of the granularity required today to engage me personally. We need to be smarter than these skin-deep views. We need to know the answers to questions like: What is most important to me? What motivated me? Which brands and UPCs do I engage? Do I only buy on promotion? Am I an impulse shopper in this category? Why are my needs and behaviors today different from what they were yesterday?

All Data Is Not Created Equal

The answers to these questions can’t come from extrapolations from small data sets, surveys or panels. They must be individual and factual and go beyond the binary “yes or no” flag to vividly describe my personal “DNA” of needs and preferences. As Sorrell observes, it is the first-party data that is key. Using the same “oatmeal” data as everyone else is not an advantage.

Ideally your first-party data includes every behavioral interaction a customer has with your business, much of which you glean from your own operational systems and those of partners and suppliers. It should include all the consumer permissions to ensure you use it appropriately and all the reference data needed to make sense of it. On top of this foundation, you can add your research data, segmentations, derived attributes, propensity scores, prospect data, contact history, response and so on.

This data must be cleaned, filtered and organized optimally. When it is, it’s a solid base on which to build your investment in technology, skills and business tools.

You should also have only one of these customer data foundations for your brand or portfolio. This may seem obvious, but so many marketers have several different data islands underpinning various channel solutions for owned, paid and earned media and perhaps agency-by-agency solutions, too. Each island is a silo, for instance, with different version of permissions and opt-ins across each. In addition to fragmenting the execution and the customer experience of the brand, it also fragments the measurement. Ultimately, these silos make it almost impossible to develop a solid and executable strategy based on a total and common view.

When Your Data Inheritance Sucks

Many of us may not have a heritage of access to great customer data ourselves. There are three broad categories of business when it comes to data accessibility:

Subscription- or account-based product and service providers, in which customer data sharing is an intrinsic part of the deal (e.g. bank, credit card, cable TV, cell phone, magazine, health club).
Event-based, transactional engagement that is direct to the customer but doesn’t require customer data sharing (e.g. most retailers, mass transit companies, venues/event ticket sales).
Indirect engagement that relies on other parties to sell directly to the customer (e.g. manufacturers, musicians, artists, studios).

In the last couple of decades, companies in the second group have developed creative solutions, usually via loyalty or frequent customer programs, to overcome these data deficits. Those in the third group may have tried warranties or competitions. Today, digital solutions enable companies in all groups to build direct engagements and increasingly rich data assets to begin to compile their own solid first-party data foundation.

“The secret of getting ahead is getting started” – Mark Twain

A plan to build a first-party data foundation is key. Sure, the access to transaction data is not always that easy, but much has changed in the data world. Digital channels are providing new and innovative ways to do this, and open or liquid data momentum is helping. Excitingly, your customers can help you gather their data now if there is the right level of trust and/or value exchange. An open approach to thinking about this data rather than considering a closed data system behind your own firewall creates a whole new dimension of possibilities.

Ironically, some of the companies that have to work hardest to get access to data are doing the best job. For instance, some traditional retailers that have built competitive advantage from data via their loyalty propositions, and more recently some CPGs have developed their own direct-to-consumer loyalty or engagement propositions.

Others, such as credit card companies and banks, are struggling to get beyond a productcentric, acquisition-oriented numbers game in which the end consumer is lost. The decades-long data heritage seems to have become an albatross as they struggle to change the systems, organization and – crucially – the data to be personalized and relevant in every interaction.

Exposure, offer and content-responsiveness data should all be derived and maintained as new first-party data for the individual customer or prospect. “Closing the loop” means understanding response at an individual level, not just the campaign level. This includes nonresponse, too. So building a first-party data asset also includes creating and deriving new data as refinements on top of the raw materials.

Don’t Be Had

While getting started on building this kind of data asset is vital, manufacturers still need to leverage other data to provide census-scale personalization today.

It’s true that better data enables a better potential outcome, but there are also thresholds below which the data can be misleading. This applies to targeting, to good planning and particularly to measurement. For instance, it’s possible to target me with a soda offer if you can see which brand I have chosen in just a couple of transactions. Response data can tell you whether I responded, too. However, to know whether my response inspired additional sales or volume and whether my consumption is easily expandable or not requires confidence that you are seeing all of my transactions. Having this available across all channels is also increasingly important.

The challenge of finding the right raw material is unfortunately magnified by the plethora of service providers out there trying to persuade us that their version of data or their algorithm is the best. Navigating this is tiring even if you understand the technicalities and almost impossible if you don’t. If you’re seeking external data for marketing purposes, rather than being impressed with one stat or other on the data, consider the following characteristics that ideally should all be present beyond the obvious prerequisite of consumer permission:

Scale
Consistency of source (not a hole-ridden patchwork)
Continuity of data (ideally a longitudinal view of the same customers over time, or at least a continuous or static group with very low churn)
Granular and disaggregated
Current and low latency

In the end, it’s not about the terabytes. Superficial stats on numbers of households are meaningless. The quality of the raw materials must be right. Otherwise, all you have is disappointing, very expensive oatmeal that leaves you bloated yet quite unsatisfied.

Follow Matt Keylock (@mattkeylock) and AdExchanger (@adexchanger) on Twitter.

Tagged in: