As marketers and advertisers try to parse real-time interactions across multiple channels, they’re finding they need big data warehouses and analytics tools. The problem is, some are startlingly slow or, in other instances, they simply hit a wall when volumes are high.
To combat this, vendors selling big data tools to marketers are increasingly building their offerings around cloud-based services designed to accommodate huge data queries in a timely and cost-effective manner.
For instance, when Marketo hit capacity with the volume of datasets it could parse via traditional Structured Query Language (SQL), it redesigned its back end with Solr, an open-source software that allows users to efficiently query incredibly large datasets.
The data-management platform (DMP) Aggregate Knowledge is another example.
“At the end of the day, on the media side of the house, it’s all about normalization and presenting the data in a scenario where its ready to be consumed by the analyst,” said Rob Gatto, SVP of media and advertising for Aggregate Knowledge owner Neustar, an information services and analytics company. Aggregate Knowledge’s data scientists and developers use Amazon Web Services for high-volume data processing and continue to tap the cloud-based offering Redshift.
“We’re doing with Amazon for significantly less dollars what we would do with a traditional data warehouse,” Gatto added. Amazon Redshift, barely a year old, claims to charge less than a 10th what most other data-warehousing solutions charge.
These hosted data warehouses differ from traditional on-premise configurations, like Teradata Aster or IBM Netezza, which are massive, expensive configurations that were primarily adopted by verticals like financial services and telcos in the past. (That’s not to say the traditional data warehouse players aren’t investing in cloud; Teradata last fall launched the Teradata Cloud available by subscription.)
However the advent of open-source software and flexible frameworks, like Apache Hadoop, has given vendors catering to the marketing community the ability to provide data tools that are more tailored toward the real-time media buy with the flexibility to roll in metrics such as “actions per thousand” among a set of unique users or segments.
Consider MapR, built on Hadoop, which breaks down petabytes of data for comScore and Rubicon Project among others. MapR’s VP of product management described an instance in which a sizable cable company was able to run ad insertions in video on demand and alter those inserts based on data from the set-top box based on the viewer’s actions.
Moreover, with enterprise bastions like Amazon or Oracle (notably with its acquisition of BlueKai, among other recent buys) entering the media landscape, the market for high-speed data analytics and nimble querying abilities, for data scientists, will continue to grow.
“Today, any analyst that’s in SQL or maybe has their own front-end SQL tool can reach into the Amazon warehouse and prepopulate, prenormalize and preaggregate data,” Gatto said. “Amazon getting into this game is really interesting because of the cost and form factor.” Some clients, additionally, are tapping business intelligence tool Tableau to normalize the data, provide a neutral look at it and prepare it for analysis in Hadoop.
Although traditional data warehouses are frequently framed as a drain on time, resources and wallets, it is important to point out that some of the common concerns around their cloud counterparts are security and the possibility of outages. Companies like Rackspace, which produce open cloud databases, mitigate the situation by providing a number of public, private and hybrid cloud options for enterprises with varying degrees of security needs.
In addition to Neustar, vendors like marketing technology company StrongView are turning to commercial clouds AWS and Rackspace to develop what the company has called an “elastic” SaaS-based marketing platform that deviates from the server-based model and accounts for cross-channel digital marketing demands. One of the reasons the company cited with regard to its use of cloud database technologies is access to “virtually unlimited resources” for select periods of time.
In addition to the bandwidth, the cost-value performance benefit of an Amazon Web Services “was through the roof” compared to what developers paid for other sources, which has precipitated a moving of analysis out of old environments into the Amazon cloud environment, Gatto said.
Amazon Redshift, according to Forrester Research’s Enterprise Data Warehouse Wave for Q4 2013, is the fastest-growing service in the history of AWS. Redshift, along with vendors such as Kognitio, HP, Actian and ParAccel are innovating at the heels of incumbents like Teradata, IBM, SAP, Pivotal, Oracle and Microsoft.
Established players in the data warehouse space managed traditional data sources such as CRM, ERP and supply chain information; newer data sources such as handheld devices, sensors and set-top boxes have expanded the need for an all-encompassing and flexible infrastructure.
“Amazon Redshift is emerging as a leading, cost-effective approach not just for marketing, but also for scientific research,” commented Ray Wang, chairman and principal analyst at Constellation Research. “It’s about the large-scale queries. There aren’t too many places to crunch. The challenge with on-premise data warehouses for marketers is the dependency on legacy IT infrastructure and sometimes, the IT department.”