How to navigate the new age of data integration

Most companies have come to understand that there is value in big data, yet many continue to struggle today with all of the market buzz surrounding the term. This is especially true for nuances like integration and optimization.

Many organizations know they should be doing something to that end, but aren't sure what. It's obvious that you want your big data project, which typically require a good amount of integration, to bring most value and best performance at the lowest total cost – but how do you get there?

This question gets an added layer of confusion when we consider that the data integration landscape is undergoing some change. Today, inexpensive and powerful platform alternatives are available, making even yesterday's best data integration practices in need of re-evaluation for big data types, processing and systems. A few aspects of the ever-changing landscape:

What to do?

So how do companies address data integration in the face of these issues and changes? In a nutshell, they need to do their due diligence. Data integration is a process that can't be rushed, and one that comes with a lot questions.

A few questions that are usually front of mind these days include: How should I offload to Hadoop? What do I offload, exactly? And, more importantly, how do I go about planning this out? The issue is not whether or not to offload some of the extract, transform and load (ETL) processes, as doing so helps form the lowest total cost of ownership (TCO) calculation. Instead, organizations need to look at the environment they have, as all environments are different, and look at all viable alternatives to reduce the cost of the data integration process.

This analysis and assessment should be done using a fact-based approach to identifying, tuning and moving data integration processes to improve efficiency and meet business requirements…all while recapturing valuable system resources for high impact business analytics.

This process helps to decide whether to modify ETL code, re-architect ETL processes, and/or extend architecture with systems such as Hadoop. In many cases, organizations need some help understanding the best data integration solution … not only for today, but for the future as it is all about using the right piece of the data ecosystem for the right job.

For instance, many companies want to expand their environment with Hadoop in order to offload ETL processes. This was the case in a recent client engagement. But once the client was encouraged to do their due diligence and assess the ETL code and analytic environment, it was also discovered that a significant amount of ETL code was inefficient. As a result, we recommended the company modify that ETL code in addition to offloading some of the lower-value ETL work. This dual-pronged approach freed up capacity for additional analytics, and shows why you can't just pick a solution without doing your homework.

The lesson: Data integration optimization is important, but it's not easy. Make sure you do all the legwork to ensure you're getting the most bang for your buck.

David R. Schiller is CCP at Teradata Products and Services Marketing

What to do?

Useful links