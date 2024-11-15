It’s almost become cliché to start articles about data with an acknowledgement of the sheer volume of data that modern economies create and, with the increasing prominence of AI, the demand for all of this data is simply going to continue to increase.

Thanks to the rise in business cloud storage platforms, organizations also now have a much more convenient way to store these huge quantities of data, but that has come at a cost: this data is decentralized like never before. That means, for a modern data consumer, it’s not just tough to find and access the data they’re looking for; they can’t even be sure it exists at all.

Collecting, cataloguing and governing the large amounts of data stored by an organization is now paramount to getting the most out of it. In an effort to accelerate this move towards ensuring data is both discoverable and usable, though, businesses are increasingly adopting a one size fits all approach via a data catalog alone. Whilst this approach is being praised as a silver bullet for businesses, could it actually be holding them back?

Bart Koek Social Links Navigation Field CTO, Immuta.

Uniting your data

To overcome the challenges associated with large data lakes, which are increasingly becoming more like data oceans, many data professionals have adopted data catalogs to help bring some order to their large quantities of data. But what exactly are data catalogs, and why might they not be the sole solution to the challenge of uniting an organization's data?

Data catalogs provide the frameworks and interfaces to manage and collect metadata, or information on the lineage, reliability, quality and sensitivity of data and data products. Put more simply, they allow users to see all of the information they need about data stored by an organization, which is an incredibly powerful tool for, say, the data engineers who are building their organization's data products – think of the data catalog like a large inventory list at a builders yard, allowing data product builders to find all of the right materials for their next project at a glance.

Data catalogs are, therefore, incredibly important to an organization serious about getting the most from its data. They are not, though, a one size fits all solution to the challenge of making data discoverable, governable and accessible.

The data marketplace

As data consumers become an increasingly important part of a modern data strategy, businesses need to look beyond a catalog-only approach to their data tools, and begin considering an internal data marketplace. But what exactly is a data marketplace, and what can it offer data consumers that they can’t achieve with a data catalog alone?

When talking about marketplaces, many data professionals tend to think of selling data externally. However, there are just as many benefits to businesses using internal marketplaces to solve issues around discoverability, accessibility and governance. These internal marketplaces deliver data products within a business, helping to drive corporate missions and data-driven initiatives. Some organizations use the phrase data exchange interchangeably with internal data marketplace, but by and large these all refer to the same thing: a platform to make data products more readily available to data consumers.

When we talk about data marketplaces increasing data product availability, this doesn’t just mean discovery. Data product delivery, a process which enables consumers to use data platforms or BI tools, is key to a successful marketplace. Without proper provisioning, a data marketplace would be very similar to a catalog, but in turn would only be half complete; it would be a bit like using the App store to locate the perfect app, only to find that instead of being able to download it, you have to raise a ticket with the app creator and wait a week to use it.

Note the key difference here between a catalog and a marketplace is that they serve two distinct functions: catalogs are great tools for data engineers building your data products, whilst marketplaces are the best way to empower data consumers with those data products that they need.

Blending catalogs and marketplaces

So, between a catalog and a marketplace, which one is right for a modern business? The answer is both.

A catalog-only strategy fails because it tries to serve two types of users instead of being allowed to focus on serving the builders whilst leaving space for a marketplace to empower data consumers. This is bad not only for the consumers who are left inadequately served, but it can also fail builders, too.

Relying on just a data catalog means that, whilst data engineers are able to identify the right data for their products, it can also create large amounts of additional work. In a scenario where data catalogs are used by consumers, not only is it much harder to find what consumers are looking for, but they would also be exposed to all of the intermediate data used to build the final data products, leading to a large amount of accountability and pressure being placed on the teams owning each data product.

Complementing a data catalog with an end to end data marketplace that offers the ability to publish, find and access products delivers the best of both worlds: two purpose built solutions for both groups of data users, allowing data engineers to use catalogs without the need to document everything in the data platform, freeing them up to focus on the products that they own and publish. Consumers, meanwhile, are empowered with a marketplace to find and learn about data products and benefit from automated access and governance provisions.

There’s no one size fits all approach to gathering your data and making it useful to everyone in your business. By combining data catalogs and internal data marketplaces, though, organizations can benefit from two powerful tools that give them the best of both worlds to drive maximum value from the data they own.

