Why metadata matters: stories from the real world

Image Credit: Shutterstock (Image credit: Image Credit: Alexskopje / Shutterstock)

These days, data makes the world go round – maybe even more than money does. With data presented accurately and attractively, companies can attract investors, satisfy regulators, increase sales, save money, and operate as lean, mean efficient machines.

That, of course, presupposes that they know where there data is and can summon the right data for each purpose; the data needed for an annual report, of course is going to be different than the data needed for a marketing campaign targeting medium-sized customers. But getting at the requisite data isn't always simple; in most organizations, locating and accessing the appropriate data is far more complicated than pressing a few buttons. Data can be mislabeled, incorrectly stored, or otherwise be unorganized. 

Because of those metadata issues, data that should have high value to the organization is rendered far less valuable. As a result, reports could be missing data because it's difficult to find, and this could lead to confusion, missed deadlines, and even regulatory penalties. The key to preventing this kind of problem is to shore up metadata, ensuring data transparency and data lineage. Systems that can enable organizations to get control of their data can prevent those losses and problems, and ensure that organizations thrive.

Metadata – the data about data – can consist of dozens, hundreds, or thousands of tags, all used in data storage areas, including databases for different departments, customer information, databases for the entire organization, corporate documents, ETL tools, analysis tools, data storage units for social media information, reporting tools, etc. All these are sources of data and in order to be automatically searchable, all need to contain consistent metadata tags – which means that all members of an organization involved in collecting and recording data need to follow consistent conventions on what metadata tags to use.

Controlling data

The inability to control data – which is often due to poor metadata management – is quite common. According to one study, 85% of companies have taken significant steps to be data-driven, but only 37% say they have been successful at doing so. And being in control of data is crucial for some of the most important tasks businesses need to carry out. 

How do those 63% non-successful companies cope? With their business intelligence teams, of course. For example, in the case of an annual revenue report for the overall organization whose results don't match the results of each individual department when taken together, the organization's BI team would go through the databases and try to match up the corresponding information. It's in that largely manual examination that the BI team will discover and hopefully resolve the metadata inaccuracies. But doing that could take a long time, and cost the dealership a good chunk of money.

To solve that and other metadata inaccuracies, the organization needs  an automated system that can provide the tools needed to resolve those issues. The ideal system would be able to recognize from within the data itself to which metadata category it belongs. Thus, it would look at a metadata tag called “Expenses” with its corresponding data entry in one database, a tag called “Outlay” in a second, and a tag called “Costs” in a third. The automated system would be intelligent enough to realize that the numbers in all three relate to the same thing, despite the metadata differences.

Image Credit: Pixabay

Image Credit: Pixabay

The case for accurate metadata control  

The business case for this is obvious; time saved, both for BI teams who now have automated tools to help them do their job more accurately and efficiently, as well as time saved and more efficient operations for any part of the organization that depends on data (ie all of them). But besides making business hum along, resolving metadata issues can help solve regulatory problems – which with the advent of GDPR,  California's CCPR, new HIPAA regulations, and others that are likely to come down the pike, are now very much in the spotlight. 

The GDPR, for example, requires organizations to show that they are able to locate data on individuals, in order to be able to comply with rules on the “right to be forgotten.” Failure to even be able to demonstrate this is grounds for a fine under the rules. And without accurate metadata control – where information is tagged differently in the different data sources – rounding up all that data on a single individual (one of millions who may be in the organization's database) is going to be nigh-impossible.

Data is burgeoning. Already, over 2.5 quintillion bytes of data are produced worldwide each day; by 2020, there will be 5,200 GB of data for every person on Earth. And the IoT revolution, which is just beginning, promises to increase those figures exponentially. To expect that all organizations, devices, systems, databases etc. even within an organization are going to conform to a single metadata standard is probably wishful thinking. Automated metadata resolution has got to be a part of the data revolution for any organization that plans on using data (ie all of them).

Amnon Drori, CEO and Co-Founder of Octopai 

Amnon Drori

Amnon Drori is the Co-Founder and CEO of Octopai, the first centralized metadata search engine for BI. He has over 20 years of leadership experience in technology companies. Before co-founding Octopai he led sales efforts at companies like Panaya (Acquired by Infosys), Zend Technologies (Acquired by Rogue Wave Software), ModusNovo and Alvarion, and also served as the Chief Revenue Officer at CoolaData, a big data behavioral analytics platform. Amnon studied Management and Computer Science at the Open University of Tel Aviv.