Nvidia's next big graphics chips explained

Nvidia Kepler and Maxwell: what you need to know

Kepler and Maxwell explained

Although he told TechRadar that the GPU Technology Conference isn't about "speeds and feeds", Nvidia CEO Jen-Hsun Huang also announced the first ever roadmap for future Nvidia products

This isn't a roadmap for graphics cards, but for the GPU architectures that power both gaming graphics and the parallel programming GTC (Nvidia's GPU technology conference) concentrates on.

Coming in the second half of 2011, Kepler is the replacement for the current 40nm Fermi GPU; it will be Nvidia's first 28nm chip and Huang expects it to deliver "a big step up"; three to four times the performance per Watt of Fermi – from the current peak of 1.5 to up to 6 double-precision floating point operations per second per Watt.

He's quoting performance per Watt rather than simply performance because, just like CPUs, GPUs have to deal with power issues: "In the future we know there is a power wall, so performance per Watt equals performance.

"With parallel computing, transistors are free but power is not. If we are conscientious about the use of performance per Watt, we will continue to expand performance with the number of transistors we add."


It's not just packing in more transistors by going to a smaller nanometer scale, Huang says: "It comes from innovation as well as architectural efficiency."

The new GPUs will support traditional CPU features like virtual memory, scheduling and pre-emption to make it easier for programmers to work with both GPU and CPU resources and to make it easier for operating systems to take advantage of them.

"These are vital to the era where you have multiple apps [running on the GPU]," he told us. "In the future you'll be able to mix and match."

Nvidia roadmap

ROADMAP: How Keppler and Maxwell compare to previous Nvidia GPU processors for parallel computing

We asked Ian Buck, Nvidia's Software Director of GPU Computing and the man Huang credits as the inventor of CUDA, for an example of how this might work.

"Now there are two types of processor in the system, these are simplifications to make it easier to move code onto the GPU," he explained "and we hope to do more of that in the future. We have a memory management unit and we can have the GPU read directly out of CPU-side memory. It's not much of a stretch to go the other way or to allow the programmer to load data back and forth. We think that's a natural direction and way to do OS integration."

Mid-life kickers and mobile parallel computing

The next architecture will be based on the 22nm Maxwell GPU, coming in 2013. "The improvement from Tesla [the chip Fermi replaced last year] to Maxwell should be about 40 times," Hsuang told TechRadar. "Fermi is about four times faster, relative to Tesla, so it's ten to 12 times faster from Fermi to Maxwell."

In between each new GPU will be what Huang calls a "mid-life kicker" updating the previous architecture to improve the performance by enhancing the microarchitecture. The first will add more cores and faster memory to Fermi next year, and similar update to Kepler will come in 2012.

The new GPUs will obviously show up first in Nvidia's high-end graphics cards, but "from the moment it goes into the architecture in the high end products in three months we should have in it the entire family," promises Huang.

Faster rollout

He says that will happen faster than it did for Fermi, which had delays getting into production because the fabric of wires that interconnect between the 256 processors was so closely coupled that they were interfering with each other and had to be redesigned.

"We found a major breakdown between the models and tools, and reality!" He blames the design flaw on the fact that it was the responsibility of two different engineering groups; "the engineers who understand physics and the engineers who understand architecture sat in two different organisations."

With that management issue fixed, he says that although there will be further problems ("we don't know what we don't know") they'll be spotted sooner. "The design is progressing very rapidly; we have hundreds of engineers working on it and by the time we're done we will have invested a couple of billion dollars."

Lost performance

LOST PERFORMANCE: Issues of heat and power consumption mean that CPUs aren't getting faster the way they used to; we're losing out on the performance benefit we have come to expect and enjoy, says Huang

Kepler and Maxwell won't immediately go into Nvidia's Tegra mobile processors although the parallel architecture will end up there.

"Tegra 3 is almost done," Huang told us. "Tegra 4 is being built and every year there will be a new Tegra. I am highly enthusiastic about taking parallel computing to mobile devices. We know if one thing would benefit more than any device from the most energy efficient performance, it's the mobile computer."

Why did Nvidia finally start talking about the future? It's not to compete with Intel or AMD's announcements, he claims and he didn't only pick GTC to announce the roadmap because it's Nvidia's big public event - it's because the software developers who need to know about future Nvidia chips are the ones using them for parallel computing rather than games, and the performance they can expect dictates the features they can develop.

"They need to know what they should expect in one or two years' time. Should they expect performance to increase by 50% or one times or two times? That's important information when it takes then two or three years to develop software. If I were developing today I would target a GPU that's four times faster than today."

Article continues below