The future of storage according to Phison

At TechRadar Pro, we're lucky enough to have access to some of the brightest minds in the world of technology, letting us peer into the future to identify the next big trends.

Storage company Phison Electronics may be a lesser-known manufacturer to some, often operating behind the scenes, but it has become an essential cog in its ecosystem. The company produces controllers that go into millions of solid state drives and devices worldwide and, as such, has a big say in shaping the future of storage.

We spoke to Sebastien Jean, Senior Director Systems Architecture at Phison, on the future of SSDs, storage class memory, MRAM, large capacity solid state drives and more.

How far are we from seeing PLC memory chips rolled out? Are controllers Flash ready for penta-level cells? What sort of challenges have you experienced when dealing with PLC?

PLC memory chips are technically feasible today on bench experiments. The challenge in making them commercially viable come from ensuring their performance characteristics are useful for storage applications. NAND vendors have to balance several inversely related features including: read speed, write speed, program/erase cycling, data retention and manufacturing yield. To see how these play out, we only have to look at QLC.

In theory QLC should provide a 33% cost reduction due to the higher bit density. Unfortunately the cost saving is only around 10% due to the challenges around factory yield. A secondary factor affecting price comes from cycling. Today the most advanced QLC can provide 1200 cycles. The next nearest competitor can only provide 600 cycles. As such the market leader has less pressure to reduce prices. Realistically, we aren’t likely to see PLC NAND for another five years given the struggles we are seeing today in maturing QLC.

Most controllers have a flexible backend NAND sequencer, so changes in addressing or command format are not likely to cause problems. The areas that tend to get more complicated as NAND density increases are: ECC length, Scrambler Seeding, Number of XOR active buffers and error recovery. The first three are relatively simple to address, but also tend to increase the ASIC size. The last one can be difficult but it is managed through firmware, which allows for many iterations after the ASIC has taped out. NAND vendors do a good job of sharing this information in a timely manner to ensure storage ASIC are ready when the NAND is ready. New storage controllers are needed every one or two years due increasing speed, capacity or NAND requirements. As such the upcoming changes for PLC are not a significant concern. We just have to plan for them.

The biggest challenge for PLC will come when we try to find a role for this NAND. It will likely have characteristics that are more limiting than QLC. The cycling count will be lower, while program and erase time will be longer. Initially QLC was only suitable for archival applications, but has since moved up to cold storage or read intensive hot applications. Thankfully, QLC is improving with every generation. Given that PLC magnifies the complexities of managing QLC, it is likely that PLC will follow the same evolution as QLC, but on a much slower path.

RAM-based storage is an order of magnitude faster than the fastest SSDs, something that will be even more apparent with DDR5. Is this a market Phison is considering exploring?

SSDs will be even faster with PCIe Gen5 and later Gen6. NAND IO speeds are also increasing. Two years ago, each bus could only support 1.2 GB/s. Now they run at 1.6 GB/s and next year they will run at 2.4 GB/s. There is a clear path to 4.8 GB/s in the ONFI specification assuming the ecosystem doesn’t switch from a parallel protocol to a serial protocol. If NAND switches to a PCR protocol, there is no reason why it can’t match DDR or PCIe speeds. SSD performance scales easily with NAND.

DDR based storage has been around for over 20 years, but it has never caught on. It mostly comes down to cost and performance in comparison to the existing alternative solutions. Looking at performance, the absolute fastest interface to data for the CPU is the internal DDR bus. The ATTO solution is placing storage behind a network interface and a block storage protocol which greatly increase the individual command latency. Then there is the matter of cost. DDR is currently going for approximately $5/GB whereas NAND costs around $0.15/GB. Today, most applications can meet their storage needs with a Pool of SSD. If a single 8TB SSD can service a command in 1 msec, then 24 SSD can provide an average completion time of 41 nsec. Comparing the cost of each solutions at 192 TB, the SSD based solution is $30K, whereas the DDR based solution exceeds $980K. Looking at the problem from the other angle, if we set the budget at $30K, the SSD based solution provides 192TB of storage, but the DDR based solution only provides 6 TB. Finally there is the fact that DDR is volatile. So the DDR storage solution is either a limited volatile high speed cache that is slower than DRAM or has a backup of some kind, which further increases the solution cost. There is a market for very fast DDR based storage that fits in a small formfactor, but it is small. If tomorrow DDR based storage suddenly became dominant, Phison already has the in-house technology to produce this kind of drive.

Ultra large SSDs (20TB or more) are expected to replace hard disk drives in the data center in a near future as the pros vastly outweigh the only con (price). As an industry insider, what direction of travel are you seeing from your partners and what are your thoughts?

Phison is currently developing solutions that exceed 64TB. We see a divergence in the roles played by SSD and HDD. As the SSD cost goes down, flash based storage will take on more and more of the Hot storage tier. HDD will progressively move to the Nearline and Cold tiers. The increasing speed of PCIe Gen4, Gen5 and Gen6 allows the SSD to take on a new role that the HDD cannot touch. The SSD now augments the LLC (L4) cache found in CPU today. We’re already seeing that evolution in modern gaming where the SSD is being actively used as a just-in-time texture cache. In that role, the SSD is augmenting DRAM at a lower cost per GB. Textures are fetched and dropped by the higher level tiers as needed. This allows for a vast improvement in graphics without increasing the system DDR cost.

Enterprise and HPC applications are currently asking for 16-32 TB, with a forecasted need for 256TB SSD in the next 2-3 years. Though the next generation of HDD are expected to reach 80-100 TB, the SSD can already beat that density using 1Tb TLC. NAND density is projected to go up to 2 Tb per die in the next few years and there is a clear path to 4 Tb. Magnetic media is not likely to be able to keep up, given how long the current set of density improving technologies have taken to develop.

When do you think NVMe will supersede SATA when it comes to SSD storage and when, roughly, do you estimate PCIe x4 will become the dominant controller technology?

In many respects, both of those changes have already come about on the client side. The SATA interface persisted because value configurations focused on HDD and HDD were only available with a SATA interface. Though there is no technical reason preventing a 600 MB/s from adopting a Gen3 x2 PCIe interface. The price of SSD has come down to the point where in 2019, over 75% of laptops were equipped with SSD in 2019/2020. The advantages in weight reduction, battery life improvement and mechanical warrantee issues outweigh any saving based on the interface price difference. The value tire will likely stay on PCIe Gen3x4 for a few more years, but the mainstream and premium tiers are making a broad adoption of Gen4x4.

The enterprise space now has more PCIe SSD sales than SATA, but at this point SATA is likely to stay around for another 4-8 years. The Enterprise has a typical 4 year refresh cycle and there is already a very large install base for SATA. Organizations that needed the faster speed have already switched to Gen3 NVMe. Over time SATA and SCSI based equipment will become less common. The Enterprise Gen3 install base is expected to start a large migration to Gen4 this year, but the migration will be gradual. We expect another 4 years of solid sales of Gen3 in this space. That is why we refreshed our popular E12 controller with the new FX Controller which has the lowest IOPS/Watt on the market.

Where does Phison sit when it comes to storage class memory (given it is well documented that PCIe SSD with your controller is actually faster than someone else's SCM product)?

SSD were easy to integrate into PC’s and data centers storage because they are 100% compatible with existing infrastructure. This applies to the server chassis, PC cases, laptops, BIOS, OS and applications. Initial deployments could not take full advantage of the SSD characteristics, but users did see an immediate benefit when switching over due to lower power, faster sequential speed and higher robustness.

On the other hand SCM is typically implemented on the DDR bus as an NVDIMM. Existing applications can’t take advantage of the non-volatile aspect without significant changes, because they are designed to treat DDR as volatile. In fact SCM tends to be a little slower than DDR. This knocks SCM off the easy adoption path. Placing the SCM behind an NVMe interface address that problem, but current SSD can already saturate the PCIe bus. The only benefit to using SCM as storage is that it has a lower individual command latency. It turns out that there are very few applications that need this particular capability. As such you end up with an SSD that is significantly more expensive and provides no real benefit for most applications. We do believe SCM has a place in the SSD, but it is not as primary storage.

A lot of the up and coming challengers use Phison controller (e.g. Sabrent, GigaSSD etc). How do you help them to differentiate their product?

Phison acts as an on-demand engineering service for our partners. Each company has a different idea of what aspects to prioritize on their SSD. We configure our product to align with their requirements. Some customers focus on price, others want low power and others still go after the upper end of performance. This is a win-win for both sides, because Phison can focus on engineering, while our customers can focus on selling the drives. This division of labor lowers Phison’s business risk by spreading development cost across many sales organizations. Our partners lower their overall risk by only paying for the engineering service they are using without having the ongoing operational expense of maintaining large engineer teams. If they offer a product that does not sell, they can quickly adapt by ordering a different configuration.

For those who want the biggest SSD ever, how difficult would it be to have an eight-SSD enclosure that fits in a 3.5-inch bay?

Physically installing the SSD into an enclosure is quite easy. A 3.5” bay could be designed as a mini storage chassis that can hold 8-16 M.2 SSD, though power and cooling requirements would be challenging. Most of the existing solutions top out at 4 slots. The current M.2 max density is 8TB, so this 4 slot solution can reach 32 TB of raw storage, though it would be safer to implement a RAID 5 which reduces the addressable storage to 24 TB. The only real complication with an 8x M.2 bay is finding one.

What are Phison's plans when it comes to Computational Storage? Is it in your pipeline to offer on-the-fly encryption/compression/dedup?

Phison already offers on-the-fly encryption on our Opal and FIPS 140-2 SSD products. As mentioned above, this works because it is a capability that can operate on data that is already going to the SSD. Compression is easy to accommodate on the SSD and aligns with the streaming model concept, but it provides limited benefit given that most of the bulk data (Photos, Video or Music) is already fully compressed. There are large data sets that can benefit from compression, but the use-case is relatively uncommon, so it tends to be delegate to dedicate server appliances.

The case for dedupe breaks the streaming model for several reasons:

1) It requires a huge amount of memory to track the hashes for each sector.

2) SSD’s are already fully tasks in datacenter environments, so any work spent searching is taken away for host IO

The only real benefit in having the SSD perform the search is a slight reduction in PCIe bus transfer time and a reduced load on the host CPU. Conversely the SSD has to go up in cost due to higher computational requirements and additional DRAM. Its active power also necessarily has to go up. The dedupe problem is better implemented using spare system resources, particularly over night when people are sleeping, instead of adding 10-20% SSD.

A type of computational hybrid devices exist today and it is very successful: Smart NIC. They combine a high speed NIC (typ. 10 GB/s) with a powerful CPU or FPGA. Though this combination works for NIC, it does not work as well for storage. The reason is fairly straight forward. The Smart part of the NIC is processing data that is already passing through the NIC to the host. The Smart NIC works well when it can process data as it streams through or when the Smart NIC is capable of servicing a request by directly accessing resources within the chassis.

The typical value proposition for Computational Storage is presented as followed: the SSD is closer to the data, it frees up bus bandwidth and it offloads the host CPU. At face value Computation Storage appears to be an easy sell, but it hasn’t turned out that way.

First the SSD today is already using 100% of it’s resources and power budget to service its primary function. In many cases, high density enterprise SSD have to limit performance to avoid exceeding their power or cooling budget. Second the SSD are typically using small CPU cores that are nowhere near what the host CPU or a GPU can do. Third, this experiment has already been tried before Computation Storage was a buzzword. One company attempted to combine a GPU and SSD, but the solution ended up degrading both technologies. To meet the GPU requirements, the SSD had to run very fast and add significant heat load to the GPU. The GPU is much hotter than an SSD and created substantial retention stress on the NAND. Lastly, an SSD is a consumable item that has a finite write bandwidth, whereas a GPU can run indefinitely until it becomes obsolete.

Taking a different approach, we could add a more powerful CPU directly on the SSD. Then we run into the RAM problem. Today most enterprise SSD maintains a 1000:1 NAND to DDR ratio. The SSD only needs to pull a few bytes for every 4K LBA translation so the DDR bandwidth is relatively low. This means SSD can use slower grade DRAM which lowers the entire module cost. Adding a larger guest CPU to the SSD along with more DDR for applications decreases the power available for the SSD’s primary role of providing IO to the main host. It also increases the SSD cost, but does not provide a proportional gain in compute power.

Then there is the problem with how storage is deployed today that has to be addressed. Data is usually aggregated into multi-unit RAID sets and so no one SSD will ever see the full data set. We could change the way storage is used, ensuring each SSD always sees complete data elements and use full replication to ensure redundancy. This is not likely to take hold because this model does a poor job of sharing storage bandwidth if one SSD contains more data that is currently needed. RAID stripes address this problem by staggering the accesses so that each subsequent client starts shortly after the current client. We could extend the model where each SSD has a full copy of a data set by implementing replication across multiple units, but then we have to add a lookup and load share mechanism. Duplication also has a much higher storage footprint than simple RAID5 or RAID6. Simply put, the way we use storage today is cost effective, easy to deploy and works well for most scenarios. Completely changing the storage infrastructure for what amounts to adding a few server CPU is hard to justify.

Despite the downside for general purpose Computation Storage, there are specific cases it does make sense. It occurs when the storage use-case mirrors the winning case for Smart NIC. That is to say that the SSD only has to process the data once as it moves through the device. We can associate encryption and compression with computational storage, but that’s a stretch. It is more accurate to define these two use-cases as in-line or streaming data processing using a very simple algorithm.

Phison and one of our customers developed a product where we have found a Computational Storage application that is well suited to the SSD. It does not require a large amount of memory or CPU power and does not interfere with the primary purpose of the SSD which is storage IO. We are developing a security product that uses machine learning to look for signs the data is being attacked. It can identify ransomware and other unauthorized activities with no measurable impact on the SSD performance.

You were the first enterprise SSD manufacturer to announce development of a controller with MRAM interface. What's the update on that revolutionary technology?

Enterprise ASICs have a longer development cycle than client ASICs. Phison’s Next Generation High-end Enterprise controller is now in the engineering sample stage and expect product role out 2H’2021. Once the mainstream solution is in mass production, we will start enabling MRAM. We expect to announce the MRAM based solution in Q2 or Q3 2022.

Here's our list of the best cloud storage services

Désiré has been musing and writing about technology during a career spanning four decades. He dabbled in website builders and web hosting when DHTML and frames were in vogue and started narrating about the impact of technology on society just before the start of the Y2K hysteria at the turn of the last millennium.