Big Data: Finding the patterns in the noise

Aquila Magazine (kids 8-12) – May 2015

Scan 3Big Data: Finding the patterns in the noise

We create over 2.5 quintillion bytes of data every day – that’s a lot of text, pictures, video and social media messages. Powerful computers can analyse this “Big Data” and find new patterns, and maybe even hints about the future.

How much data exists in the world? Nobody really knows, but we are creating more and more every day. The numbers are so big they hardly make sense: over 2.5 quintillion bytes of data is added every day, according to IBM – a quintillion is 1 followed by 18 zeros. There’s a name for all this – we call it “Big Data”.

So what do we mean when we say “data”? It’s everything that comes through a computer, yes, but it includes not only all the text, but lots of other things too like voice recordings and video. So when we look at data we’re not just counting text stored by businesses, newspapers and libraries, but also social media posts, online shopping orders, internet chats, and all those videos of cats jumping into boxes.

Today’s powerful computers can be used to analyse all this data and keep track of it, even as it’s growing at an ever-increasing pace. Over 90% of all data was created in the last few years alone, as technology has become more accessible and easy to use for a lot more people. The ideal outcome of collecting all this information is that we will be able to see new patterns, which may even be able to tell us about the future. One example is Google Flu Trends, which essentially looked at who was searching the web for flu symptoms, where those people lived, and used that data to predict where the next flu outbreak was going to be.

Big Data analysis is not perfect – after all, not everyone who looks up symptoms on the internet is sick. But as we collect more types of data we should be able to be more specific in our predictions. For example, futuristic refrigerators can email you shopping lists when you’ve run out of food, and a town will soon be able to monitor which parking lots are full and lead visitors to the nearest vacant spot by sending them a text message. All this is data too, and if these details, and millions of others, are put together, we may be able to see some surprising connections.

Farmers are already able to use Big Data analysis to work out which fields need what kind of fertiliser, and there are great possibilities in the field of medicine. For instance, the Howard Hughes Medical Institute in the US is using Big Data to analyse scans of the brain: “When you record information from the brain, you don’t know the best way to get the information that you need out of it. Every data set is different,” said Mischa Ahrens, one of the researchers. Scanning the vastly complex brain creates so much information it can be overwhelming, but Big Data analysis has made it possible for researchers to test out ideas faster. This will hopefully cut down the time it takes to come up with new treatments for illnesses.

Big Data is being used for strictly fun purposes too: sports fans who like to watch the replays and study the match statistics have increasingly more data to play with. During the Wimbledon Championship, a service called the Slamtracker gives tennis buffs access to data collected from eight years of tournaments. This means people can look up their favourite players’ performance statistics and playing styles, and even get victory predictions based on the backgrounds of their opponents.

While there’s no doubly the opportunities are vast, one question remains: who owns Big Data? This is especially important as we give away more and more information about ourselves on the internet, often in exchange for using services like email or social media without having to pay. Companies like Facebook and Twitter assure us information belongs to users, but the companies are still using this information to sell advertising. And what about the data collected by the phone company? Even if they’re not listening to what we are saying, the time, duration and location of a call is data too. Right now, companies are often using data about people anonymously, meaning they look at how many people did something and where they were, but the names are kept out of it.
As we continue to gather data, Big Data analysts will be able to glean more and more insights that will hopefully help us create products and services to make life better. A deli can use Big Data to work out which sandwiches sell best in what weather, for example, and make sure they don’t run out. Airlines can crunch the numbers to come up with a quicker way to board airplanes, and towns can use traffic flow analysis to prevent queues.

It’s exciting to think about what we can do with our new power of Big Data analysis, and this is only the beginning. “Big Data marks the start of a major transformation,” said authors Kenneth Cukier and Viktor Mayer-Schonberger in their book ‘Big Data’. “Just as the telescope enabled us to comprehend the universe and the microscope allowed us to understand germs, the new techniques for collecting and analysing huge bodies of data will help us make sense of our world in ways we are just starting to appreciate.”

Scan 4Scan 5


Big Data: Why technology’s biggest hype is still the real deal

UK2 Group 2014 – on

big data not hypeBig Data: Why technology’s biggest hype is still the real deal
Yelp will tell you which restaurants have great bread rolls and which don’t clean their dishes properly, thanks to feedback from diners. This is a form of Big Data too, in the sense that Yelp reviews is a mass of data that is too large, unstructured and fast-moving to analyse using traditional tools.

In between the rants from unimpressed customers is information that can help health inspectors determine which restaurants are overdue for a visit – assuming they can get to it. This problem has now been solved by New York health inspectors, who are using custom-made software to sift through Yelp reviews for references to food poisoning.

As most of the hundreds of thousands of reviews are irrelevant, Yelp is a simple example of how there’s often just too much information to make sense of. And it’s getting worse as Big Data keeps getting bigger: a whopping 90% percent of all the data in the world has been generated over the last two years, according to research from SINTEF.

Of course, Yelp reviews are far from the most reliable sources for determining which restaurants have a cockroach problem, but it may provide a helping hand to underfunded health authorities. Google Flu Trends was hailed as revolutionary when it launched in 2008, using the rise and fall in real-time symptom searches to predict epidemics. The hype turned to criticism a few years later once it became clear the predictions were vastly exaggerated, and Google was accused of “Big Data hubris”. Even after the flu-prediction algorithms were tweaked last year, predictions were 30% higher than the numbers collected from doctors.

The revelation that Google’s illness predictions are untrustworthy has sparked something of a backlash against Big Data. The theory remains sound: if we capture as much data as possible, the old method of statistical sampling will become redundant. As data processing becomes faster, we can get results in real time – just sit back and let the algorithms do the work. But the problem is that not everyone who searches for pharmacies have the flu, nor does feeling sick after a meal necessarily mean it was the food.

Increasingly more of what we do is being recorded and added to data repositories, but we may never have enough data to get unbiased, action-ready results from analytics engines. Critics should however keep in mind that this was never the promise. Co-inventor of Google Flu Trends, Matt Mohebbi, responded to the backlash by pointing out that they never meant for the service to replace standard forecasting, but instead act as a “complementary signal”.

A few problems with the early Big Data experiments don’t mean the whole concept is a dud, as Big Data still has the power to illuminate trends hidden inside massive amounts of data noise. Better understanding through data analysis could help improve practically any industry, from better agricultural yields and traffic flow, to early-intervention healthcare and personalised education. But the keyword here is “help”, as Big Data will never fully replace human experience. If we want to continue to reap the very real benefits from Big Data we need to remember it’s not magic, it’s just clever statistics.

UK2 Group

UK2GLogoI blogged about technology innovation, the cloud and trends in internet life for web hosting company UK2 GroupSamples:

* Welcome to the Slow Internet
* The dirty secret of wearable technology
* Is SnapChat pointing the way to the future of news?
* The disappearing internet
* The vital presence of social media ghosts
* The case for emoji in work communication
* Is technology ruining storytelling on screen?
* Big Data: Why technology’s biggest hype is still the real deal
* What happens to virtual spaces after the people have moved on?
* Why simpler is better for technology innovation
* The secret to viral videos
* Comments are dead, but we’re talking more than ever
* The internet is saving the radio star!