Engineers at Meta, Facebook’s parent company, have revealed how they have been able to offer free memory using a software solution called Transparent Memory Offloading (TMO).
It is now part of the Linux kernel and, in a nutshell, automatically offloads data to other storage tiers (e.g. Samsung’s CX memory expander) that are less costly and more power efficient than memory.
The savings are significant; TMO has been running on millions of Facebook servers for more than a year, saving up to almost a third of memory per server. While that is likely to be insignificant across dozens or even hundreds of servers, Facebook’s immense scale presents a unique challenge.
Analysis: Facebook's gargantuan appetite for RAM
The world’s largest social network has nearly three billion monthly active users and millions of servers spread around 21 locations worldwide. Should each server carry 128GB of RAM on average, that would amount to 256 million GB (or 256PB) of RAM which, at an average cost of $4 per GB (DDR4 ECC RAM), is about $1 billion worth of memory. That’s on the assumption that Facebook has at least two million servers (Facebook’s blog quoted “millions of servers” as early as July 2018), with the real number likely to be far higher.
Numbers presented by the team that worked on TMO showed that the cost of memory accounts for a third of Meta’s server bill of materials, with compressed RAM and SSD accounting for less than 11%. More worryingly, the cost burden of RAM (as a percentage of the total infrastructure) has more than doubled since Facebook launched its first generation of servers (it's currently on the fourth).
Adopting TMO does come with some drawbacks; most notably, a degradation in performance. But the gains in terms of power and memory savings, far, far outweigh the disadvantages and future iterations combined with hardware improvements (e.g. faster SSD or CXL drives) will offer further mitigation.