Why leveraging metadata is vital to getting more out of big data

Metadata can provide a broad range of quantifiable benefits

Companies in virtually every industry are working feverishly to capitalise on big data, and harnessing its power across various lines of business has become the Holy Grail for information technologists. As big data repositories continue to scale both in number and size, so does the game-changing potential for organisations to effectively tap into and manage them.

Whether the ultimate goal is to enhance business strategies and processes, or to gain a competitive advantages in sales and marketing activities, big data can certainly be a catalyst for big results. But big data by itself does not represent actionable business intelligence. In order to yield tangible outcomes, big data must be easily searched, retrieved, analysed and consumed across the enterprise for a variety of applications – a process that can be highly streamlined and enhanced by using metadata.

Metadata, often referred to as "data about the data", can help companies better leverage, manage and ultimately harness the vast information resources that typically reside in multiple systems in order to reach organisational goals. Simply put, a metadata-driven approach to enterprise information management can help organisations achieve high returns on their initiatives where fast access to precise content that resides in large and diverse big data repositories is of paramount importance.

Why metadata matters

Metadata are the attributes, properties and tags that describe and classify information. They may be represented in the form of virtually any and all distinguishing characteristics associated with the information asset (type of information asset, author, date created, workflow state, and so forth). Once defined, metadata helps expose the value and purpose of the content, and it becomes an effective tool for organising and quickly locating information.

In looking at the value of metadata to get more out of big data, it helps to quickly review its evolution. When applications that leveraged metadata for classifying and organising information first emerged, metadata was mainly used to add keywords to the content.

This need is diminishing because indexing technologies and text analytics tools have evolved substantially during the last few years. While adding the keywords from (text) content to the document metadata is mostly just redundant work these days, adding descriptive metadata that does not directly exist within the content plays an important role in more effectively managing information. For example, while text analytics tools may determine that a proposal pertains to Customer X, they cannot identify whether or not the customer ultimately accepted the proposal.

This status attribute serves as critical business intelligence, helping sales reps pinpoint which proposals translated to successful sales results. When metadata is tied to search algorithms, users can generate highly precise results. This is particularly beneficial in big data scenarios, where standalone keyword-driven results may include an abundance of less relevant information. By leveraging metadata, users can quickly locate the right document, despite the vast amount of content residing within their repositories.

Netflix, one of the most successful services for entertainment enthusiasts, offers insight into the power of metadata. Netflix employs teams to curate a comprehensive set of metadata for each title in its database. With all this information, Netflix can identify programming preferences for its viewers based on viewership history.

Similar developments are occurring in enterprise systems: metadata is being applied to help search algorithms better understand users' past behaviour and their connection to the files in their organisation's repositories, applications and databases.

And because metadata exists for all structured data throughout an organisation, such as information that resides in CRM, ERP and other database systems, it can serve as the bridge that connects this structured data with the unstructured content (Microsoft Office documents, PDFs, media files) it relates to. Managing both structured data and unstructured content in one system allows users to gain better insights into data assets.