Skip to main content

Why big data is crude oil – while rich data is refined, and the ultimate in BI

Is rich data just a smokescreen?

Some believe that rich data is no more valuable than big data. "The problem will be that the majority [of rich data] is in those hard-to-get seams, and that requires some serious work and effort to extract," says Jamie Turner, CTO of Postcode Anywhere, which has over a billion queries a year. "But its sheer volume makes it valuable and important to not overlook."

Jamie Turner, CTO of Postcode Anywhere

Jamie Turner, CTO of Postcode Anywhere

Turner doesn't think that attribution is the answer, saying: "The greatest volume will be unstructured and hard to understand but way more valuable. It's also worth remembering that attribution done badly is even worse because your start relying on an indirect measure of things rather than the raw data."

What about the Internet of Things?

Another reason why we need rich data is that big data is about to explode. The Internet of Things will mean a plethora of devices coming online, from thermostats and scales to TVs and smart energy meters all constantly creating 'time-series' data. The end result will be a huge pool of data that needs to be sorted, managed and used.

"Connecting these 'things' to the internet and using the data from them can provide a better service back to the customer – whether it is helping them reduce their energy spend, or stick to a diet and exercise plan," says Pfeil, who insists that the use of that data has to be clear. "Customers want to feel that their data is being used in their best interests, and that it is kept secure," he says.

It's likely that NoSQL databases – developed by the likes of Google and Facebook – will need to be used to manage the huge amounts of time-series data that IoT devices will create.

Apps

Open sources of rich data from governments can be used to make apps

What about open data from governments?

Open data is just unstructured big data. Government departments produce immense amounts of raw data to inform policy decisions – on everything from live traffic information and residential property sales to obesity and deprivation levels – and much of it is now being made public for anyone to analyse and use, perhaps to develop apps. But there's a problem.

"Data is being published by government departments and agencies, but not generally in a format that is easily discoverable or linkable," says Adam Fowler, Principal Sales Engineer at Enterprise NoSQL database platform vendor MarkLogic. "What's needed is a system that supports security and privacy requirements, and web publishing, incorporates semantic technologies for better discoverability and querying, and uses recognised standards for linked open data so that new data can be easily linked to existing data sources."

Should there be one centralised repository? Fowler thinks that the existing Data.gov.uk website should allow interactive querying of the underlying data. "For example, an open data report published on the use of homeless shelters, by borough, would need to exclude individuals' names," says Fowler. "This could help central government or charities to better allocate funds, reflecting up to date usage across the country."

So if open data published by government departments and agencies was easily linkable and discoverable by individuals and businesses, it would have so much more value; if it's not in a format that is easily discoverable and useful then it may as well be closed data. Common, open standards have been proposed by the ODI (Open Data Institute) and the W3C, and Tim Berners-Lee, the initiator of the Linked Data project, has suggested a 5 star deployment scheme.

Matt Pfeil, Chief Customer Officer at DataStax

Matt Pfeil, Chief Customer Officer at DataStax

Is rich data reliable?

Rich data is only as good as the personal data it uses. A recent report from Symantec called State of Privacy looked into attitudes towards data privacy across Europe, including the UK, and found a growing mistrust in how businesses and governments treat personal data. A third of people in the UK provide false data to protect themselves and over half of those surveyed (57%) are now avoiding posting personal details online altogether.

"You may be putting your faith in user data at the expense of truth," says Sian John, Chief Security Strategist EMEA, Symantec, to organisations relying on user data. "Data does not always acknowledge the human side of your customer. Too much reliance may deliver an advertising or marketing campaign with little relevance."

Is rich data a threat to privacy?

Worries about a Big Brother society is causing a breakdown in trust between individuals and companies. "We are entering a world where consent will be king, and the more that companies have to ask customers for this, the more they may be rejected," thinks Cano-Lopez.

Some think that rich data can, in time, be used as leverage. "In the future, people may choose to control information that they are creating and then monetise this back to companies – this may be in the form of lowering their bills or getting better service quality from one provider," says Pfeil.

People may want more control over their personal data, but the system is not set up for this.

"You would need a central portal where this data would be stored allowing businesses to upload consumer data and consumers access to provide cross-brand permissions," says Jason Lark, Co-Founder and MD of Celerity, adding: "Think how tough it is for many small businesses to record their data, while many businesses are still working on bringing their own customer data together."

Merging all of this data into one portal would be, says Lark, a Herculean task. Beyond logistics, there are social implications, too. "Does it involve empowering individuals or nationalising data?" asks Lark. "Are we depriving individuals of their data, or companies of their property? We need to really think about these issues."