"Push them to the limit": MIT researchers almost double SSD performance 'for free' but only for data centers

A data center with racks of servers and lots of lights glowing
(Image credit: Getty Images)

  • Sandook software coordinates many SSDs to avoid slowdowns from garbage collection
  • Two-tier control system reroutes workloads across pooled drives in real time
  • Performance gains approach theoretical limits but depend on large clustered storage environments

Researchers at MIT and Tufts University have built a storage management system called Sandook that pushes pooled SSDs closer to their theoretical limits. The project, targets a long-standing issue inside large storage clusters where identical drives rarely perform in identical ways.

Solid-state drives slow down for a number of reasons, including internal garbage collection cycles and the slower nature of write operations compared with reads. Those slowdowns can ripple across workloads when multiple applications share the same storage pool.

Rather than leaving each SSD to handle performance issues on its own, the system splits control duties across two coordinated layers that manage activity across the full drive pool.

Article continues below

Unleashing the potential of data center SSDs

As Blocks & Files reports, a central controller collects performance telemetry from each large SSD and revises scheduling choices about 5x per second.

Local agents inside storage servers pass along performance signals and congestion warnings as workloads change.

When a drive begins housekeeping duties such as garbage collection, the system lowers its priority and transfers traffic to healthier drives in the pool. That rerouting happens without requiring changes to applications accessing the storage.

The method builds on techniques already used in enterprise storage, including block replication for reads and log-structured writes that can land on any available device.

Trials included database processing, neural network training, large-scale image compression, and latency-critical storage services, and the system reportedly delivered between 30 and 82 percent higher raw input and output throughput compared with earlier approaches that targeted single bottlenecks.

Across pooled workloads, application performance gains ranged from 12 to 94 percent, with latency reductions reaching up to 88 percent. In some cases, storage throughput reached roughly 1.7x previous levels.

The gains come entirely from software, which means commodity off-the-shelf SSDs remain unchanged. CPU and memory overhead for monitoring dozens of drives per server was described as being minimal.

The research paper, titled “Unleashing The Potential of Datacenter SSDs by Taming Performance Variability,” is available to view here.

Despite the headline numbers, this isn’t something most consumers could run at home. The design depends on large groups of SSDs working together, along with Linux-based infrastructure and enterprise networking setups common in data centers.

That pooling effect is where most of the performance improvement comes from. Without spare drives to shift workloads onto, a single-drive system would see little benefit.

Blocks & Files notes the work will be discussed at the USENIX NSDI 2026 event in May, where the researchers plan to show how coordinated scheduling helps solve unpredictable SSD behavior across large clusters.


Google logo on a black background next to text reading 'Click to follow TechRadar'

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds.


Wayne Williams
Editor

Wayne Williams is a freelancer writing news for TechRadar Pro. He has been writing about computers, technology, and the web for 30 years. In that time he wrote for most of the UK’s PC magazines, and launched, edited and published a number of them too.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.