How to avoid costly data errors in the enterprise

Using data quality to manage risk

The Top Causes of Big Data Errors in an Enterprise

Data errors can occur at any moment in the enterprise, and in a world where business value lies in this very data, the everyday risk for companies is huge.

To find out what are the most consistent offences when it comes to data errors, and how we can protect ourselves from these mishaps, we fired some questions at Jeffery Brown, product manager at data specialist Infogix.

TechRadar Pro: What are the top causes of data errors in an enterprise?

Jeffery Brown: The causes for data errors within an enterprise are truly endless, but here are some of the top ones:

  • Source data (External data supplied by third party)
  • Manual entry of data (Missing or inaccurate data entry)
  • Conversion/aggregation/translation of data through systems (ETL systems)
  • Data processing (data lost a processed through business)
  • Data cleansing (data that is altered or "changed" by anywhere else than the source)
  • Overwhelming data storage (enormous data warehouses that in the "junk drawer syndrome")
  • Data parsing (portions of being sections off into data marts)

TRP: How do you reduce the risk of data errors?

JB: There are three components to reducing the risk of data errors: Focus on people, process, and technology. By ensuring that people understand the data, what are the guidelines to follow, and how to improve data quality, organizations can reduce risk greatly.

Through a proper process, companies can implement the appropriate data governance initiatives and framework, which creates structure and accountability to data.

Lastly, technology is the tool to help realize this reduction in data errors. I like to think of it as a manufacturing line – people do the work, the process ensures the line is running efficiently, and technology is the structure that allows for scalability.

TRP: What's an example of a high-profile data error in recent years that made headlines?

JB: Some of the biggest "data quality" offenders over the past year have really been associated with third party data providers. This is when a data is sold to companies to associate existing customers with specific data or to use for marketing purposes.

One example of a major culprit are what's called "data brokers" who sell customer data to companies without the customer knowledge.

Several large retail customers have recently received negative high profile attention by accidentally mailing marketing promotions with extremely sensitive information on the letter.

TRP: Data errors can obviously hit financial gains, but what other types of things can be affected?

JB: When data errors are publicly exposed, it affects not only the company's customer relationship, but it tarnishes the brand's image for a long while.

This is something that is hard to measure, because image is not a tangible asset. Realistically, data errors can also affect a company's perceived value, as well as deterring future customers from doing business with you.

TRP: What impact do increasingly diverse data input sources - like mobile - have on data collection for enterprises?

JB: Diverse data sources have caused data collection efforts to be an extremely complex and time consuming. This means that companies have had to react by throwing additional resources and spending countless cycles trying to ensure the quality of the data that is being collected.

With the increase in input sources, the call for mechanisms and processes to manage these vast sources is more important than ever.

TRP: How do automated controls enable data errors to be flagged? Can they help companies anticipate error or are they solely reactive?

JB: Automated controls help identify errors for data-in-motion and data-at-rest by utilizing defined business rules to check against. The data errors and results can be sent to key stakeholders and reported on, so that bad data isn't propagated throughout downstream systems.

By identifying and tracking data errors, companies can establish trending graphs and conduct analysis which allows them to catch errors that might be escalating from a certain source or process step.

This allows companies to hone in on poor vendor data, late/incorrect files, or even duplicate data submissions. The customer's reactions help prevent bad data and anticipate future harm from being exposed to their processing or reporting systems.

TRP: What factors and at which touch-points is data most vulnerable to error?

JB: Data is pretty much vulnerable to errors all the way from cradle to grave. From the time data is entered or even captured, it is subject to being incomplete, inaccurate, or unreliable.

As the data is aggregated, transferred, cleansed, reported on, and stored, the opportunities for data errors increase. This is why automated controls are critical to ensure that a company's data is of the highest quality and integrity.

With systems become more and more disparate, the handoff (transfer) points increase and create chances data to fall off or become corrupted.