Using Hadoop in big data analysis

Big data abstract

As the quantity of data collected by businesses continues to expand, new forms of data management are developing to identify commercial opportunities, and big data analysis is becoming a core business function.

It's well understood that data has value, but extracting that value is proving to be difficult. A survey by technology services firm Avanade showed that 85% of respondents reported obstacles in managing and analysing data. These included being overwhelmed by the sheer volume of data, security concerns and not having enough dedicated staff for the analysis. Also, 63% of stakeholders felt their company needed to develop new skills to turn data into business insights.

Business case

We're now at the point where, when business and the IT managers look at upgrading their data management and analysis systems, they're asking whether Hadoop is the answer.

The key to successfully deploying the framework is to clearly understand the goals for the installation. IT managers need to be vigilant that Hadoop does not become another highly complex system to manage that yields few real insights, and it is vitally important to understand its ecosystem.

For instance, most installations of Hadoop will use the Flume framework to handle the data streams that Hadoop will produce. Using Sqoop, a tool for transferring bulk data between Hadoop and structured datastores, is necessary to connect the Hadoop output with standard SQL databases. This makes it easier to query large data silos using familiar tools.

In addition, Zookeeper is used to manage data that could be spread over a large number of data silos, and provides a centralised management system for use across clusters of data. These tools are freely available.

Digital IQs

PwC's fourth annual Digital IQ survey said that companies need more than ever to make the technology they are employing work harder.

"Raising a firm's Digital IQ means improving the way it leverages digital technologies and channels to meet customer needs," said John Sviokla, principal at PwC.

"The core of the ecosystem for innovation has moved from inside the firm to out in the marketplace. Customer and employee expectations are being shaped by this new, dynamic and exciting environment—if you miss this trend you will be increasingly irrelevant to the market."

In an age of big data, Hadoop is becoming a key technology that can deliver real value to its users; but it is not a panacea. There are security issues, and it requires specific skills to set up and maintain an environment using Hadoop.

There are alternative systems, but none take the same holistic approach as Hadoop, which has emerged from the integration of a group of projects on big data analysis on an open source platform.

Dell's white paper, Hadoop Enterprise Readiness, provides a good snapshot of how important it is to businesses that need robust data analysis.

"In short, leveraging big data analytics in the enterprise presents both benefits and challenges," it says. "From the holistic perspective, big data analytics enable businesses to build processes that encompass a variety of value streams (customers, business partners, internal operations, etc.).

"The technology offers a much broader set of data management and analysis capabilities and data consumption models. For example, the ability to consolidate data at scale and increase visibility into it has been a desire of the business community for years. Technologies like Hadoop finally make it possible."