The technology industry moves fast. So it can be surprising that IT and digital leaders still talk about “big data” – a term first coined in the early 1990s. These were the datasets too large and complex for traditional software to handle. Specialized tools emerged to deal with this new category of data, characterized by its overwhelming variety, enormous volume and high velocity.
In reality, the threshold for what we can consider big data has massively increased. And while the original challenge associated with big data, that of storage, has largely been solved, new problems have emerged to take its place. Today it is the challenge of governing and optimizing their data that keeps IT leaders awake at night.
The right data at the right time
Every piece of data is different. In the past, where organizations may have solely focused on transactional data, today they are collecting data in a variety of formats (e.g. audio, video) or from a range of sources (e.g. IoT devices, social media). Many are routinely collecting and managing datasets far larger than those previously considered “big” data. At the same time, this huge volume of data is also residing all over the distributed enterprise – from on-premises servers to the public cloud and increasingly out to the network edge.
As digital transformation continues to power the post-pandemic economy, there’s only one direction of travel. According to IDC, the total volume of global data is expected to reach in excess of 180 Zettabytes by 2025 (for context this is the equivalent of 540 billion 4K movies or 135 trillion MP3 songs). This is clearly a huge number, but if you look underneath the covers, 90% is replicated or retained data. While a lot of this will be driven by data compliance factors, organizations do need to focus more on how they are storing and categorizing their data.
While it obviously depends on the use case, most data beyond one year becomes stale and of less value. In a world where organizations want to become more event driven and real-time, the right data needs to be in the hands quicker than ever before.
Chris Royles is Field CTO for EMEA at Cloudera.
Secure and actionable insight
With organizations having access to so many data sources today it is vital they develop a data strategy that guarantees useful and actionable insights. Adopting a modern data architecture is essential to extracting business value, enabling organizations to connect different types of data across silos, whether on-premise or across multiple clouds, without needing to copy or move data.
To create such a data fabric and to start unlocking value from their data, organizations need to look for several key elements in a technology platform. Any platform must operate friction-free on-premise, across public clouds and the edge, so that workloads and data can flow freely without rewriting or refactoring. Services must be portable across different infrastructures without the need for redevelopment. Finally, a platform must handle all data types – structured, semi-structured, and unstructured; real-time, streaming and batch.
Let’s not forget security and governance, as it remains a major roadblock to data initiatives. Many early projects had little to no security by default, relying more on the wider network and systems in place. Today, this is no longer acceptable as data needs to be encrypted everywhere, in motion and at rest, with authentication a must. By creating a modern architecture underpinned by a data lakehouse, organizations can optimize their data and democratize the sharing of information, while ensuring they maintain the highest levels of data security and governance. For organizations operating in highly regulated industries this is particularly essential because they must continually demonstrate that their storage, sharing and usage of data is compliant.
A data-led future
Research shows that 90% of business leaders feel their organization would experience more growth opportunities if it could manage its data more effectively. At a time when all organizations today want to move faster, they need to stay ahead of the data curve. Those that are capable of harnessing their data in a rapid, cost-effective manner – no matter where it is located – will find they have a significant competitive advantage.