As the volume of data produced by internet activity, digital devices and IoT sensors continues to expand at an aggressive rate, businesses are running out of time to solve a critical problem: where to put it all.
According to a recent IDC report, the quantity of data created within the next five years will be greater than double the amount generated since digital storage came into use.
Although less than 2% of the 64.2 ZB (68.9 billion TB) created last year was stored for the long term (the rest was either overwritten or temporarily cached), global data storage needs are still outpacing the expansion of total capacity.
- Check out our list of the best cloud storage services
- We've built a list of the best large capacity drives out there
- Here's our list of the best cloud backup services
While hard disk drives (HDDs) and solid state drives (SSDs) do an excellent job of holding and supplying the quantities of data our everyday devices need to function, neither are well-suited to storing information en masse and for long durations.
When it comes to archival storage, Linear Tape-Open (LTO) magnetic tape rules the roost, with the lowest cost per capacity of any technology. The current generation of tapes, LTO-8, have a native capacity of 12TB and can be purchased for as little as $75 (or $6.25/TB).
However, while cost-effective, tape has its weaknesses too; data can only be accessed serially, making it hard to locate specific files, and companies also need to migrate to fresh tape on a semi-regular basis to avoid data loss.
To try and solve the looming data crisis, researchers are hunting for new ultra-dense and ultra-durable storage technologies. A few different candidates have emerged, but one concept looks particularly promising: deoxyribonucleic acid, better known as DNA.
What is DNA storage and how does it work?
DNA, the foundational material of living organisms, comprises four molecular building blocks: adenine (A), guanine (G), cytosine (C) and thymine (T). These compounds connect in pairs (A-T & G-C) to form the rungs of the famous double helix ladder.
This structure can be utilized as an extremely dense and durable form of data storage, by converting binary 1s and 0s into the four-lettered genetic alphabet. A single gram of DNA has been found to be capable of storing 215 PB (220,000 TB) of data.
“DNA data storage is the process of encoding and decoding binary data onto and from synthesized strands of DNA,” explained a spokesperson for the DNA Data Storage Alliance (DDSA), founded last year by Microsoft, Western Digital, Twist Bioscience and Ilumina.
“To store data in DNA, the original digital data is encoded, then written (synthesized using chemical/biological processes) and stored. When the stored data is needed again, the DNA molecules are sequenced to reveal each individual A,C,G or T in order and remapped from DNA bases back to 1s and 0s.”
DNA outperforms current archival storage technologies in almost every category. A recent paper estimates that 9TB of encoded DNA can be squeezed into just 1mm^3 of space, meaning the volume of a single LTO cassette would hold 2 million TB of data, roughly 167,000 times the capacity of an LTO-8 tape.
In a real-world scenario, DNA could be used to store the whole of YouTube (which is thought to host roughly 400,000 TB of new video each year) in a small refrigerator, as opposed to acres and acres of data centers.
Unlike magnetic tape, which needs to be replaced every decade or two depending on usage, DNA can last for thousands of years in the right conditions. This means the total cost of ownership (TCO) has the potential to be extremely low.
DNA is also biodegradable and easily replicable, and consumes little power beyond the energy required to manufacture the necessary climate, making it extremely environmentally friendly.
However, there are still numerous reasons DNA is yet to make tape storage obsolete. The technology is still in its infancy, with kinks to be ironed out at almost every stage of the process, from encoding to synthesis to sequencing.
According to Turguy Goker, Director of Advance Development, LTO at storage company Quantum, it is too early to “place any bets on this horse yet”.
“DNA storage is swimming in some choppy waters right now and it will take a few years before it can navigate safely towards commercial shores,” he explained.
Dense and durable, but slow and expensive
As promising as the early signs may be, there are still a number of hurdles to vault before DNA can begin to put a dent in the world’s storage capacity problem. The main issues concern cost and speed.
To prevent degradation, DNA requires a very specific climate, which can be both difficult and costly to maintain. Specifically, DNA either needs to be kept at exceedingly low temperatures or exposed to carefully controlled airflow.
Using current techniques, the process of writing data to DNA is also extremely time-consuming when compared with incumbent technologies. Until this can be improved, DNA storage will remain unusable at scale.
“DNA writing is a chemical process and is inherently much, much slower than the digital electronics we’re currently accustomed to using,” explained Goker. “Without overcoming this barrier, writing to DNA-based storage is analogous to emptying a swimming pool using a drinking straw.”
Reading data stored in DNA poses challenges too, with a high likelihood that errors are introduced during the sequencing process. For this reason, the DDSA expects the earliest adopters of the technology to use it for write once, read never (WORN) or write once, read seldom if ever (WORSE) use cases (e.g. storing certain data types to fulfil regulatory requirements).
Aside from the technological issues, the lack of common standards needs to be addressed, to ensure DNA storage technologies will be interoperable both with one another and legacy technologies.
However, with DNA storage attracting both attention and investment from governments, storage incumbents and tech giants alike, work is underway to find solutions to these problems.
For example, the US Office of the Director of National Intelligence launched the Molecular Information Storage (MIST) program last year, with the stated goal of developing DNA technologies capable of writing 1TB and reading 10TB within 24 hours, at a cost of less than $1,000.
Separately, Twist Bioscience has developed a method of increasing DNA synthesis yield by a factor of 1,000 by using a silicon platform that miniaturizes the chemistry required.
According to the DDSA, concerns about data accuracy will be allayed by scripts capable of correcting sequencing issues, and the organization also believes there remains time to establish specifications that will prevent fragmentation across the industry.
“Unlike synthesis for healthcare, which must be perfect, DNA storage can tolerate errors due to the correction algorithms typically used in storage today. DNA storage pioneers are already working on encoding and error correction algorithm improvements that will mitigate this risk and recover the data accurately,” a spokesperson explained.
“As the methods and tools for commercially viable DNA data storage become better understood and more widely available, the Alliance will consider the creation of specific specifications and standards (e.g. encoding, physical interfaces, retention, file systems) to promote the emergence of interoperable DNA data storage-based solutions that complement existing storage hierarchies.”
Is this the end for tape?
Although the arrival of DNA storage will pose questions about the enduring usefulness of magnetic tape, there are those that believe the writing is not on the wall just yet.
For example, asked whether it felt DNA would put its tape storage products under threat, IBM gestured towards improvements in the density of tape, which is also tried and true in a commercial context.
“As data volumes continue to soar around the globe, tape technology remains the solution of choice for enterprise data retention, protection and resiliency for on-premises and hybrid cloud environments,” said Andy Walls, CTO and Chief Architect at IBM’s flash storage division.
“It's also the most environmentally friendly storage tech available, consuming zero power and lasting for decades. And because we continue to advance the density of the tape, today a single cartridge from IBM (that's smaller than a VHS cassette) can hold an incredible 60TB of compressed data. These are some of the qualities that make tape the go-to solution for the biggest hyperscalers who rely on it for inexpensive, reliable archival storage.”
At the end of last year, IBM also announced it had broken the world record for areal density on a prototype tape made of strontium ferrite (SrFe), developed by Fujifilm. The pair achieved a record 317 GB/in^2, which translates to 580 TB per cartridge, showing that tape has a way to go before reaching its maximum density.
Although the attributes of DNA storage are most comparable to tape, Quantum believes DNA is more likely to slot into existing setups than replace the incumbent technology entirely.
“Tape shows no signs of disappearing any time soon, especially for on-premise long-term archival purposes,” Goker told us. “It’s the most economical form of storage per megabyte, it can store large amounts of data per cartridge, and it requires very low running costs. It is also one of the safest storage mediums out there as data is stored offline and can also serve a role as an active archive, a key and important function for hyperscalers.”
“Instead of looking at both storage options as competing, we should look at their complementary nature when working in tandem. DNA will complement tape in the future by coexisting as a tiered system within hyperscale data centers. DNA is unlikely to replace magnetic tape for the next several years, but will occupy a tier below it, for write once read rarely use cases. A perfect mix for big data archival scenarios.”
However, while tape is unlikely to be usurped in the short term, lodged as it is at the heart of enterprise storage systems, there is little sense that the decades-old technology will be able to withstand the tsunami of data on the horizon, irrespective of R&D.
Although tape capacity has tended to almost double with every LTO generation, outperforming SSD and HDD capacity growth by magnitudes, even this exponential rate of expansion cannot outpace the volume of data being produced.
The next frontier for data storage
If analysts are to be believed, the data storage crisis will come to a head within the next half decade. If storage technologies do not catch up in time, the consequences could be manifold.
For example, the inability to store a sufficient amount of data will mean businesses are less well equipped to recover from disruption, whether triggered by cyberattack or changing socioeconomic conditions. The full value of analytics will remain untapped (and unknown), because companies will have to work with incomplete datasets.
From a consumer perspective, it’s possible that social media platforms, email companies and others could begin to delete older data and posts, to make room for the ever-flowing river of fresh content. Google, for example, recently announced it will start to delete data attached to its Gmail, Drive and Photos services from accounts inactive for two years or more.
However, with its unique set of properties and characteristics, DNA is perhaps the most likely savior.
According to Luis Ceze, an expert in DNA storage at the University of Washington, it will take between eight and ten years for DNA to be adopted in large-scale commercial contexts. Other specialists we consulted concurred with this assessment.
However, Ceze also told us that research trends are “favorable” and that “boutique markets for smaller data needs are already viable today”. There is hope, then, that the race against the clock can still be won and data calamity averted.
- Here's our list of the best rugged drives right now