“Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.
Today’s column is written by Jasmine Jia, associate director of data science at Blockthrough.
The term “machine learning” seems to have a magical effect as a sales buzzword. Couple that with the term “data science,” and lots of companies think they have a winning formula for attracting new clients.
Is it smoke and mirrors? Often, the answer is “yes.”
What is quite real though is the need for best practices in data science and for companies to invest in and fully support talent that can apply those principles effectively.
Laying the foundation for machine learning
Machine learning success starts with hiring talent that can harness machine learning – a team of skilled data scientists – which is very expensive. Adding to the cost is time. It takes a lot of it to build a data science team and integrate them with other teams across operations.
A successful machine learning pipeline requires data cleaning, data exploration, feature extraction, model building, model validation and more. You also need to keep maintaining and evolving that pipeline. And not only is the cost high, but companies also rarely have the patience and time to manage this process and still meet their ROI objectives.
Defining best practices
With the right talent and pipeline in place, the next step is establishing best practices. This is vital. Machine learning depends on how you implement it, what problem you use it to solve, and how you deeply integrate it with your company.
To paint a picture of how things can go wrong just think about the times that imbalanced data sets led to what the media called “racist robots” and “automated racism.” Or, on a lighter note, how about those memes showing machine learning confusing blueberry muffins with Chihuahuas. Or mixing up images of bagels with pics of curled-up puppies?
Best practices can prevent some of these common pitfalls, but it’s essential to define them for the entirety of the data analysis process: before decisioning, during decisioning and after decisioning.
Let’s take this step by step.
Before: It is all too common for companies to update an offering by adding a feature. But often they do so before completing meaningful data collection and analysis. Nobody has taken the time and resources to answer, “Why are we adding this feature?”
Before answering that all-important question, other questions need to be addressed. Are you seeing users doing this behavior naturally, already? What will the potential lift be? Is it worth the expense and time to tap into your engineering resources? What is the expected impact? What would this new feature ultimately mean to the future success of this product?
You’ll need a lot of data to answer those queries. But let’s say you culled it all and decided it was worthwhile to move ahead.
During: You’ve launched that feature. There should be an ongoing stream of data that demonstrates whether or not the new feature is driving impact at the network level, at the publisher level, and at the user level.
Are you seeing the same impact across the board? Sometimes benefits to one can hurt another. Attention must be paid. Factor analysis is key. What are the factors at play that impact the analysis? Once identified, you need to determine if they are physically significant or not.
After: At this point, there are even more questions to address. What exactly is the impact? If you use A/B testing, can those short-term experiments provide dependable long-term forecasts? What lessons can you learn? Whether it’s a failure or success, how can it keep evolving? What are the new opportunities? What are the new behavioral changes you’re seeing.
Machine learning for the long haul
There is a lot of data and oversight required to make a machine learning program truly viable. It’s no wonder that many don’t have the wherewithal to properly execute it and reap the benefits.
Here is the kicker: the data team doesn’t make the decisions. The machine learning algorithm doesn’t make the decisions. People make decisions. You can hire a fantastic squad of data scientists, and they can build and refine a machine learning model based on gobs of data that is 100% accurate. But for it to make any sort of difference to your business, you need to develop a strong workflow around it.
The best way to do that? Make sure data science teams are deeply integrated with different teams throughout your organization.
Establish a well-grounded data science practice, and you will see that machine learning can make the magic happen.
Follow Blockthrough (@blockthrough) and AdExchanger (@adexchanger) on Twitter.