The GeForce GTX 980 is using the same GPU architecture that made the inaugural Maxwell cards such a success at the low-end, and is utilising all the efficient silicon engineering to enable it to jam a vast number of CUDA cores into this relatively small GPU die.
The big change from Kepler to the latest Maxwell architecture is a fundamental change in the way the cores are distributed across the GPU and how they're accessed. The streaming microprocessor (SM) was first introduced with the old Fermi GPU tech and evolved through the Kepler architecture and its SMX update.
The SMX units of the Kepler generation have subsequently been replaced by SMM units for Maxwell, allowing for greater parallelism by apportioning more control logic inside each of the new streaming microprocessors.
The first two iterations of the SM, in Fermi then Kepler design, held a total of 192 CUDA cores looked after by a single set of control logic spread out across the lot. The new SMMs though divide themselves up into four sections of 32 CUDA cores with discrete instruction buffers, warp schedulers and dispatch units for each block.
That separation within the SMM allows for greater efficiency and boosts overall processing speed too. They're also a bit smaller too which allows Nvidia to squeeze more units into smaller GPUs. The GM 204 GPU of the GeForce GTX 980 is 398mm^2 where the GK110 of the GTX 780 Ti is a much chunkier 561mm^2, and yet Nvidia has squeezed sixteen SMM units into the GTX 980 while it could only manage fifteen in the GTX 780 Ti.
Each of the new Maxwell streaming microprocessors though have fewer CUDA cores in each - 128 in the SMM vs. 192 in the older SMX units - but because of the more efficient layout Nvidia estimate it can get around 35% extra performance from each of the CUDA cores in its new GPU architecture.
So, with those sixteen SMM units the new GeForce GTX 980 comes with a healthy 2,048 CUDA cores and because of the improved efficiency of those cores they can compete in performance terms with the full 2,880 cores the Kepler generation GK 110 GPU of the GTX 780 Ti comes with.
But it's not just a story of CUDA cores though. The new Maxwell GM 204 GPU also comes with a ROP (Raster Operation Unit) count of sixty-four - an extra sixteen over the very top of the Kepler architecture. That helps when it comes to boosting the slightly slower memory of interface of Maxwell too.
The overall memory architecture has changed and you'll notice that, while the GTX 980 might have an extra 1GB in the frame buffer compared with the GTX 780 Ti, it is running on a slower 256-bit memory bus. Normally that would have us rather concerned about the high-resolution performance of the new card, especially given that Nvidia is talking up the 4K power of its new GPU at every opportunity.
But Nvidia has done some clever things with the setup of Maxwell's memory.
For a start, the GM 204 has 2MB of L2 cache inside it, a boost of 500KB over the 1.5MB of the full-fat GK 110 GPU. That gives the GPU a bit more memory performance inside the GPU itself before spitting data out into the frame buffer.
Nvidia has also done some interesting things regarding memory compression algorithms too - along a similar line to what AMD has done with its latest R9 285 graphics card. It has used a selection of lossless compression techniques to offset the need for more expensive, higher-bandwidth memory controllers.
The results of its compression algorithms mean the Maxwell GPU can reduce the number of bytes that need to be fetched per frame from the memory. Nvidia's calculations have it at 25% fewer bytes needed per frame compared with Kepler. In terms of bandwidth then Nvidia claims its memory is effectively running at a speedy 9.3Gbps as opposed to the stated 7Gbps GDDR memory it's using.
Apples vs. Apples and Oranges
There's something else worth mentioning when we're comparing this new Maxwell implementation compared with the outgoing Kepler chips. If we're being entirely fair to the new GTX 980 we really ought not be comparing it to the GTX 780 Ti. The GM 204 sits in the Maxwell GPU generation in the same position as the GK 104 GPU did in the Kepler.
The GK 110 GPU, of the GTX Titans and the GTX 780 and GTX 780 Ti cards, was a full-fat, professional-class GPU, used primarily in the Quadro workstation cards which came after the first flush of the Kepler architecture.
And you can bet there'll be a GM 210 GPU waiting in the wings when Nvidia wants another performance boost. In those terms, the generational jump from GK 104 to GM 204 is massive.
The 1,536 CUDA cores of the top spin of that GPU - lately used in the GTX 770 - means it's a graphics card which is a lot slower than either the GTX 980 or the lower-caste GTX 970. And that's despite both the GTX 980 and GTX 770 both running at the same temperature level and maximum platform power draw.
But we're going to keep referencing the GTX 780 Ti in comparison with the new Nvidia GeForce GTX 980 because they both represent the top consumer SKU of their respective graphics card generations.
Actually, we're mostly going to be referencing those two cards for the simple fact that, despite being essentially a lower class of GPU silicon, the GTX 980 can outperform the GTX 780 Ti across our benchmarking suite.