In 2012 a Harvard Business Review article famously proclaimed, “Data Scientist: The Sexiest Job of the 21st Century”. Nine years later, Data Scientists are still in high demand and top the lists of “best jobs” consistently each year.
However, it is not just a job. Over the past several years efforts have been underway to elevate Data Scientists as a profession. What is the differentiator? A profession is a disciplined group of individuals who adhere to ethical standards, have specialized knowledge and skills and apply this to solve problems using data.
Maureen Norton is Global Data Scientist Profession Leader at IBM.
But how much data actually exists? It’s a question that we sometimes see raised in the media, as different organizations using different methodologies try to estimate how much data humanity has now produced. PwC, for example, has suggested that the ‘accumulated digital universe of data’ now amounts to 44ZB, and has grown ten times larger since 2013; while these estimates vary widely, they are always astronomical, and always growing rapidly.
It’s hard to conjure an intuitive way of conceptualizing what a number like that actually means. 44 zettabytes is 44 sextillion bytes. Or 44 billion trillion bytes. Or about 4 times as many bytes as there are grains of sand. Or about 44 times as many bytes as there are stars. Or about 4,400 times as many bytes as there are insects.
Understanding the data challenge
Or, perhaps more importantly, 44ZB is one estimate of the outer limit of the scale we face when we choose to use data to solve challenges. Of course, no real-world project will come even close to this kind of magnitude in the data it utilizes, but it’s useful to recognize that ‘big data’ initiatives are always highly selective about what data they actually use.
That accumulated universe of digital data includes information on almost anything we can imagine: it is personal communication between friends and family, the output of major science experiments like the Large Hadron Collider, decades of intricate financial transaction records, our entertainment, our shopping, and our work.
Pulling all of the relevant data and only the relevant data from that vast universe is the difference between success and failure. Indeed, Gartner has predicted that, through to 2022, just ‘20% of analytic insights will deliver business outcomes’ as organizations struggle to apply the right processes to the right information. Connecting data sources together and making them available for analysis is something that software engineers and other professionals in the sector are well-equipped to do; to know what data is needed and what to do with it, however, you need a more specialized skillset.
A new science
“Data Scientist” is still a relatively new job title: while the term has become widely understood and used since it was coined a little over a decade ago, it is still very much an emerging discipline undergoing rapid growth and change. While Glassdoor named Data Scientist the second-best job in America this year – and it has been in the top five since 2016 – the range of skills and purposes it can involve is vast and still evolving.
Given the sheer variety of roles that a data scientist can play, it would be understandable if some start to see the term as simply a buzzword. Treating data science as a serious and distinct specialism, however, is about more than just maximizing the chances of a good return on investment from big data projects – though that’s certainly a key motivator. As more of our infrastructure and institutions are driven in a data-led way, we will need more professionals who know how to ensure that data is being used ethically, manage the full lifecycle of data for positive long-term impacts, and provide the leadership that teams and organizations need to interact with data effectively.
As with any highly specialized role, it can be difficult for non-experts to accurately assess how qualified and experienced Data Scientists are. One solution to this challenge is to standardize and certify the skillset this career demands. The Open Group, for example, has developed Open CDS, a certification developed in collaboration with IBM and other industry partners, to establish an independent, experience-based benchmark for data science roles. Most importantly, these standards are vendor-neutral and methodology based.
A big, bright data future
Building a stronger shared understanding of what a Data Scientist is and does is not, of course, to standardize what they achieve for businesses. In a world of unimaginably vast data resources, that will always be diverse and we will continue to find surprising new applications for these skills and tools.
It is, however, an important step towards solving some of the biggest challenges that the technology sector more broadly currently faces. Digital transformation and the application of artificial intelligence are now, in many regards, well advanced areas – but as any CTO will tell you, operationalizing these initiatives is still far from easy. Many transformation projects fail, AI strategies are still often experimental, and businesses are still learning how best to work with the new world of data.
The role of Data Scientist is, in some ways, at an odd moment in its history: on the one hand, it is now an indispensable element of how business operates, and on the other we are still only scratching the surface of everything it can achieve. That makes this an important moment to carefully foster the next generation of talent in this area, creating a community of dedicated professionals who are ready for the task ahead.
- We feature the best cloud storage services.