# I confess, I'm scared of the next generation of supercomputers

Earlier this year, a Japanese supercomputer built on Arm-based Fujitsu A64FX processors snatched the crown of world’s fastest machine, blowing incumbent leader IBM Summit out of the water.

Fugaku, as the machine is known, achieved 415.5 petaFLOPS by the popular High Performance Linpack (HPL) benchmark, which is almost three times the score of the IBM machine (148.5 petaFLOPS).

It also topped the rankings for Graph 500, HPL-AI and HPCH workloads - a feat never before achieved in the world of high performance computing (HPC).

Modern supercomputers are now edging ever-closer to the landmark figure of one exaFLOPS (equal to 1,000 petaFLOPS), commonly described as the exascale barrier. In fact, Fugaku itself can already achieve one exaFLOPS, but only in lower precision modes.

The consensus among the experts we spoke to is that a single machine will breach the exascale barrier within the next 6 - 24 months, unlocking a wealth of possibilities in the fields of medical research, climate forecasting, cybersecurity and more.

But what is an exaFLOPS? And what will it mean to break the exascale milestone, pursued doggedly for more than a decade?

## The exascale barrier

To understand what it means to achieve exascale computing, it’s important to first understand what is meant by FLOPS, which stands for floating point operations per second.

A floating point operation is any mathematical calculation (i.e. addition, subtraction, multiplication or division) that involves a number containing a decimal (e.g. 3.0 - a floating point number), as opposed to a number without a decimal (e.g. 3 - a binary integer). Calculations involving decimals are typically more complex and therefore take longer to solve.

An exascale computer can perform 10^18 (one quintillion/100,000,000,000,000,000) of these mathematical calculations every second.

For context, to equal the number of calculations an exascale computer can process in a single second, an individual would have to perform one sum every second for 31,688,765,000 years.

The PC I’m using right now, meanwhile, is able to reach 147 billion FLOPS (or 0.00000014723 exaFLOPS), outperforming the fastest supercomputer of 1993, the Intel Paragon (143.4 billion FLOPS).

This both underscores how far computing has come in the last three decades and puts into perspective the extreme performance levels attained by the leading supercomputers today.

The key to building a machine capable of reaching one exaFLOPS is optimization at the processing, storage and software layers.

The hardware must be small and powerful enough to pack together and reach the necessary speeds, the storage capacious and fast enough to serve up the data and the software scalable and programmable enough to make full use of the hardware.

For example, there comes a point at which adding more processors to a supercomputer will no longer affect its speed, because the application is not sufficiently optimized. The only way governments and private businesses will realize a full return on HPC hardware investment is through an equivalent investment in software.

Organizations such as the Exascale Computing Project (EPC) the ExCALIBUR programme are interested in solving precisely this problem. Those involved claim a renewed focus on algorithm and application development is required in order to harness the full power and scope of exascale.

Achieving the delicate balance between software and hardware, in an energy efficient manner and avoiding an impractically low mean time between failures (MTBF) score (the time that elapses before a system breaks down under strain) is the challenge facing the HPC industry.

“15 years ago as we started the discussion on exascale, we hypothesized that it would need to be done in 20 mega-watts (MW); later that was changed to 40 MW. With Fugaku, we see that we are about halfway to a 64-bit exaFLOPS at the 40 MW power envelope, which shows that an exaFLOPS is in reach today,” explained Brent Gorda, Senior Director HPC at UK-based chip manufacturer Arm.

“We could hit an exaFLOPS now with sufficient funding to build and run a system. [But] the size of the system is likely to be such that MTBF is measured in single digit number-of-days based on today’s technologies and the number of components necessary to reach these levels of performance.”

## What other factors are at play?

When it comes to building a machine capable of breaching the exascale barrier, there are a number of other factors at play, beyond technological feasibility. An exascale computer can only come into being once an equilibrium has been reached at the intersection of technology, economics and politics.

“One could in theory build an exascale system today by packing in enough CPUs, GPUs and DPUs. But what about economic viability?” said Gilad Shainer of NVIDIA Mellanox, the firm behind the Infiniband technology (the fabric that links the various hardware components) found in seven of the ten fastest supercomputers.

“Improvements in computing technologies, silicon processing, more efficient use of power and so on all help to increase efficiency and make exascale computing an economic objective – as opposed to a sort of sporting achievement.”

According to Paul Calleja, who heads up computing research at the University of Cambridge and is working with Dell on the Open Exascale Lab, Fugaku is an excellent example of what is theoretically possible today, but is also impractical by virtually any other metric.

“If you look back at Japanese supercomputers, historically there’s only ever been one of them made. They have beautifully exquisite architectures, but they’re so stupidly expensive and proprietary that no one else could afford one,” he told TechRadar Pro.

“[Japanese organizations] like these really large technology demonstrators, which are very useful in industry because it shows the direction of travel and pushes advancements, but those kinds of advancements are very expensive and not sustainable, scalable or replicable.”

So, in this sense, there are two separate exascale landmarks; the theoretical barrier, which will likely be met first by a machine of Fugaku’s ilk (a “technological demonstrator”), and the practical barrier, which will see exascale computing deployed en masse.

Geopolitical factors will also play a role in how quickly the exascale barrier is breached. Researchers and engineers might focus exclusively on the technological feat, but the institutions and governments funding HPC research are likely motivated by different considerations.

“Exascale computing is not just about reaching theoretical targets, it is about creating the ability to tackle problems that have been previously intractable,” said Andy Grant, Vice President HPC & Big Data at IT services firm Atos, influential in the fields of HPC and quantum computing.

“Those that are developing exascale technologies are not doing it merely to have the fastest supercomputer in the world, but to maintain international competitiveness, security and defence.”

“In Japan, their new machine is roughly 2.8x more powerful than the now-second place system. In broad terms, that will enable Japanese researchers to address problems that are 2.8x more complex. In the context of international competitiveness, that creates a significant advantage.”

In years gone by, rival nations fought it out in the trenches or competed to see who could place the first human on the moon. But computing may well become the frontier at which the next arms race takes place; supremacy in the field of HPC might prove just as politically important as military strength.

## What can we do with exascale computing?

Once exascale computers become an established resource - available for businesses, scientists and academics to draw upon - a wealth of possibilities will be unlocked across a wide variety of sectors.

HPC could prove revelatory in the fields of clinical medicine and genomics, for example, which require vast amounts of compute power to conduct molecular modelling, simulate interactions between compounds and sequence genomes.

In fact, IBM Summit and a host of other modern supercomputers are being used to identify chemical compounds that could contribute to the fight against coronavirus. The Covid-19 High Performance Computing Consortium assembled 16 supercomputers, accounting for an aggregate of 330 petaFLOPS - but imagine how much more quickly research could be conducted using a fleet of machines capable of reaching 1,000 petaFLOPS on their own.

Artificial intelligence, meanwhile, is another cross-disciplinary domain that will be transformed with the arrival of exascale computing. The ability to analyze ever-larger datasets will improve the ability of AI models to make accurate forecasts (contingent on the quality of data fed into the system) that could be applied to virtually any industry, from cybersecurity to e-commerce, manufacturing, logistics, banking, education and many more.

As explained by Rashid Mansoor, CTO at UK supercomputing startup Hadean, the value of supercomputing lies in the ability to make an accurate projection (of any variety).

“The primary purpose of a supercomputer is to compute some real-world phenomenon to provide a prediction. The prediction could be the way proteins interact, the way a disease spreads through the population, how air moves over an aerofoil or electromagnetic fields interact with a spacecraft during re-entry,” he told TechRadar Pro.

“Raw performance such as the HPL benchmark simply indicates that we can model bigger and more complex systems to a greater degree of accuracy. One thing that the history of computing has shown us is that the demand for computing power is insatiable.”

Other commonly cited areas that will benefit significantly from the arrival of exascale include brain mapping, weather and climate forecasting, product design and astronomy, but it’s also likely that brand new use cases will emerge as well.

“The desired workloads and the technology to perform them form a virtuous circle. The faster and more performant the computers, the more complex problems we can solve and the faster the discovery of new problems,” explained Shainer.

“What we can be sure of is that we will see the continuous needs or ever growing demands for more performance capabilities in order to solve the unsolvable. Once this is solved, we will find the new unsolvable.”

By all accounts, the exascale barrier will likely fall within the next two years, but the HPC industry will then turn its attention to the next objective, because the work is never done.

Some might point to quantum computers, which approach problem solving in an entirely different way to classical machines (exploiting symmetries to speed up processing), allowing for far greater scale. However, there are also problems to which quantum computing cannot be applied.

“Mid-term (10 year) prospects for quantum computing are starting to shape up, as are other technologies. These will be more specialized where a quantum computer will very likely show up as an application accelerator for problems that relate to logistics first. They won’t completely replace the need for current architectures for IT/data processing,” explained Gorda.

As Mansoor puts it, “on certain problems even a small quantum computer can be exponentially faster than all of the classical computing power on earth combined. Yet on other problems, a quantum computer could be slower than a pocket calculator.”

The next logical landmark for traditional computing, then, would be one zettaFLOPS, equal to 1,000 exaFLOPS or 1,000,000 petaFLOPS.

Chinese researchers predicted in 2018 that the first zettascale system will come online in 2035, paving the way for “new computing paradigms”. The paper itself reads like science fiction, at least for the layman:

“To realize these metrics, micro-architectures will evolve to consist of more diverse and heterogeneous components. Many forms of specialized accelerators are likely to co-exist to boost HPC in a joint effort. Enabled by new interconnect materials such as photonic crystal, fully optical interconnecting systems may come into use.”

Assuming one exaFLOPS is reached by 2022, 14 years will have elapsed between the creation of the first petascale and first exascale systems. The first terascale machine, meanwhile, was constructed in 1996, 12 years before the petascale barrier was breached.

If this pattern were to continue, the Chinese researchers’ estimate would look relatively sensible, but there are firm question marks over the validity of zettascale projections.

While experts are confident in their predicted exascale timelines, none would venture a guess at when zettascale might arrive without prefacing their estimate with a long list of caveats.

“Is that an interesting subject? Because to be honest with you, it’s so not obtainable. To imagine how we could go 1000x beyond [one exaFLOPS] is not a conversation anyone could have, unless they’re just making it up,” said Calleja, asked about the concept of zettascale.

Others were more willing to theorize, but equally reticent to guess at a specific timeline. According to Grant, the way zettascale machines process information will be unlike any supercomputer in existence today.

“[Zettascale systems] will be data-centric, meaning components will move to the data rather than the other way around, as data volumes are likely to be so large that moving data will be too expensive. Regardless, predicting what they might look like is all guesswork for now,” he said.

It is also possible that the decentralized model might be the fastest route to achieving zettascale, with millions of less powerful devices working in unison to form a collective supercomputer more powerful than any single machine (as put into practice by the SETI Institute).

As noted by Saurabh Vij, CEO of distributed supercomputing firm Q Blocks, decentralized systems address a number of problems facing the HPC industry today, namely surrounding building and maintenance costs. They are also accessible to a much wider range of users and therefore democratize access to supercomputing resources in a way that is not otherwise possible.

“There are benefits to a centralized architecture, but the cost and maintenance barrier overshadows them. [Centralized systems] also alienate a large base of customer groups that could benefit,” he said.

“We think a better way is to connect distributed nodes together in a reliable and secure manner. It wouldn’t be too aggressive to say that, 5 years from now, your smartphone could be part of a giant distributed supercomputer, making money for you while you sleep by solving computational problems for industry,” he added.

However, incentivizing network nodes to remain active for a long period is challenging and a high rate of turnover can lead to reliability issues. Network latency and capacity problems would also need to be addressed before distributed supercomputing can rise to prominence.

Ultimately, the difficulty in making firm predictions about zettascale lies in the massive chasm that separates present day workloads and HPC architectures from those that might exist in the future. From a contemporary perspective, it’s fruitless to imagine what might be made possible by a computer so powerful.

We might imagine zettascale machines will be used to process workloads similar to those tackled by modern supercomputers, only more quickly. But it’s possible - even likely - the arrival of zettascale computing will open doors that do not and cannot exist today, so extraordinary is the leap.

## Replicating the human brain

In a future in which computers are 2,000+ times as fast as the most powerful machine today, philosophical and ethical debate surrounding the intelligence of man versus machine are bound to be played out in greater detail - and with greater consequence.

It is impossible to directly compare the workings of a human brain with that of a computer - i.e. to assign a FLOPS value to the human mind. However, it is not insensible to ask how many FLOPS must be achieved before a machine reaches a level of performance that might be loosely comparable to the brain.

Back in 2013, scientists used the K supercomputer to conduct a neuronal network simulation using open source simulation software NEST. The team simulated a network made up of 1.73 billion nerve cells connected by 10.4 trillion synapses.

While ginormous, the simulation represented only 1% of the human brain’s neuronal network and took 40 minutes to replicate 1 second’s worth of neuronal network activity.

However, the K computer reached a maximum computational power of only 10 petaFLOPS. A basic extrapolation (ignoring inevitable complexities), then, would suggest Fugaku could simulate circa 40% of the human brain, while a zettascale computer would be capable of performing a full simulation many times over.

Digital neuromorphic hardware (supercomputers created specifically to simulate the human brain) like SpiNNaker 1 and 2 will also continue to develop in the post-exascale future. Instead of sending information from point A to B, these machines will be designed to replicate the parallel communication architecture of the brain, sending information simultaneously to many different locations.

Modern iterations are already used to help neuroscientists better understand the mysteries of the brain and future versions, aided by advances in artificial intelligence, will inevitably be used to construct a faithful and fully-functional replica.

The ethical debates that will arise with the arrival of such a machine - surrounding the perception of consciousness, the definition of thought and what an artificial uber-brain could or should be used for - are manifold and could take generations to unpick.

The inability to foresee what a zettascale computer might be capable of is also an inability to plan for the moral quandaries that might come hand-in-hand.

Whether a future supercomputer might be powerful enough to simulate human-like thought is not in question, but whether researchers should aspire to bringing an artificial brain into existence is a subject worthy of discussion.