UK2 Group 2014 – on VPS.net
Big Data: Why technology’s biggest hype is still the real deal
Yelp will tell you which restaurants have great bread rolls and which don’t clean their dishes properly, thanks to feedback from diners. This is a form of Big Data too, in the sense that Yelp reviews is a mass of data that is too large, unstructured and fast-moving to analyse using traditional tools.
In between the rants from unimpressed customers is information that can help health inspectors determine which restaurants are overdue for a visit – assuming they can get to it. This problem has now been solved by New York health inspectors, who are using custom-made software to sift through Yelp reviews for references to food poisoning.
As most of the hundreds of thousands of reviews are irrelevant, Yelp is a simple example of how there’s often just too much information to make sense of. And it’s getting worse as Big Data keeps getting bigger: a whopping 90% percent of all the data in the world has been generated over the last two years, according to research from SINTEF.
Of course, Yelp reviews are far from the most reliable sources for determining which restaurants have a cockroach problem, but it may provide a helping hand to underfunded health authorities. Google Flu Trends was hailed as revolutionary when it launched in 2008, using the rise and fall in real-time symptom searches to predict epidemics. The hype turned to criticism a few years later once it became clear the predictions were vastly exaggerated, and Google was accused of “Big Data hubris”. Even after the flu-prediction algorithms were tweaked last year, predictions were 30% higher than the numbers collected from doctors.
The revelation that Google’s illness predictions are untrustworthy has sparked something of a backlash against Big Data. The theory remains sound: if we capture as much data as possible, the old method of statistical sampling will become redundant. As data processing becomes faster, we can get results in real time – just sit back and let the algorithms do the work. But the problem is that not everyone who searches for pharmacies have the flu, nor does feeling sick after a meal necessarily mean it was the food.
Increasingly more of what we do is being recorded and added to data repositories, but we may never have enough data to get unbiased, action-ready results from analytics engines. Critics should however keep in mind that this was never the promise. Co-inventor of Google Flu Trends, Matt Mohebbi, responded to the backlash by pointing out that they never meant for the service to replace standard forecasting, but instead act as a “complementary signal”.
A few problems with the early Big Data experiments don’t mean the whole concept is a dud, as Big Data still has the power to illuminate trends hidden inside massive amounts of data noise. Better understanding through data analysis could help improve practically any industry, from better agricultural yields and traffic flow, to early-intervention healthcare and personalised education. But the keyword here is “help”, as Big Data will never fully replace human experience. If we want to continue to reap the very real benefits from Big Data we need to remember it’s not magic, it’s just clever statistics.