Wherever you look, there is no shortage of statistics or analysis pointing to the global explosion in data growth. According to CSC Insights, data production is expected to be 44 times greater in 2020 than it was in 2009, with business data volumes doubling globally every 1.2 years.
However, the problem in making the most of this increasingly valuable asset is not the larger volume of data but the complexity of getting most value from it. Most of this growth is from new forms of data – such as social media content, images, video and sensor data – often generically categorised as 'unstructured' data, because they don't follow a neat row-and-column format typically used for storing and analysing data.
Additionally, the optimal value of these complex sources can only be realised from the application of new, unfamiliar types of analysis.
Not surprisingly, companies are reacting to these dramatic changes, to take advantage of this tremendous opportunity for business improvement. As a result, big data is moving decisively to the top of the boardroom agenda.
However given the complexity of the topic action taken is often haphazard, without a clear direction or strategy, resulting in lost opportunities and a slow realisation of potential benefits.
A recent Teradata poll of European companies found that almost one half (47%) are already running big data projects or plan to within the next two years. And momentum is growing – even through governmental support - for example, the European Commission is funding a Big Data Public Private Forum (BIG) designed to engage all stakeholders in advancing the big data debate.
In the US, larger firms have advanced even more rapidly. In 2009 there were only a small number of big data projects, worth just $100 million, yet today more than 90 per cent of Fortune 500 companies have some type of big data initiative underway.
Given that the growth in data is predominantly driven by new 'unstructured' data sources, there is also a significant impact on the methods employed to store and analyse this asset. This is mirrored by the growing interest in new storage frameworks, especially open source solutions, such as Hadoop.
Hadoop – moving beyond experimentation
As a first step in big data, many businesses have embarked on an exploration of Hadoop, attracted by the concept of downloading free open-source software on low-cost commodity servers to improve their ability to effectively analyse data within the business.
Yet this approach is not without risk. First, to start with the solution is to look through the wrong end of the telescope. Instead, the organisation should first consider the business problems to be addressed and then outline an appropriate response.
Second, any development should be subject to rigorous and continuous analysis as to whether it is working and fit for purpose as the best solution to the problem.
Having said that, Hadoop does offer a number of unique benefits to the business. As a large distributed file system, it allows the organisation to acquire and store large volumes of semi-structured and unstructured data cost-effectively. As a result, it is increasingly being perceived as a highly-efficient long-term data storage platform.
Hadoop is also an efficient way of sequentially processing files. This is especially valuable for pre-processing tasks such as preparing web logs for loading into a data warehouse.
However, as a traditional batch process tool, Hadoop is less efficient than a traditional data warehouse in handling queries requiring data across different files, and can only support a small number of user queries at a given time.
So where does that leave us? Those businesses implementing Hadoop typically find it quick and easy to store massive volumes of different data types and do much of the initial data manipulation and preparation required. However, they quickly recognise the limitations of running analytics in this environment – the truth is that there is no single silver bullet for the wide variety of analytics needed today.