Why trace rays for cases when rasterisation simply is better and faster? In short, use RT for the features that it can do best.

New Nvidia GPUs?

TechRadar: with that in mind, is Nvidia doing any specific work to optimise future architectures for ray-tracing? Do you think chips optimised for "hybrid" rendering would look substantially different?

David Kirk: As I said, GPUs can do this now. It is certainly possible that we could provide special hardware that would make RT better or faster, but I think that today's hardware is pretty good.

The combination of current APIs and CUDA allows developers to write any program they want, anyway. Some programs are faster and more efficient than others, though, and I expect we will work to optimise the hardware to run these better. RT is certainly one such program, but there are many others.

I think that chips optimised for hybrid rendering will look substantially the same as GPUs do now. They would have hardware for accelerating special features in the APIs, such as texture, rasterisation, and programmable shaders, and they would have a general purpose interface for running parallel C programs, like CUDA. We'll continue to expand CUDA to make it better for a larger class of programming problems, but I don't see any need for substantial changes yet.

TechRadar: Regards CUDA and our discussion about the possibility that it might be adopted by other vendors of graphics hardware and your suggestion that NVIDIA positively welcomes this - what's in it for Nvidia to have CUDA supported for competing hardware? How would this actually work - would licenses need to be acquired / paid for?

David Kirk: I don't have any comment about licensing - interested parties should enquire! I'm simply saying that in much the same way as C can be compiled for many architectures, whether x86 or PowerPC, CUDA is just parallel C and can be compiled for other parallel or serial architectures.

Broader adoption has the advantage that CUDA code can run in more places, so the investment of writing your application in CUDA becomes more valuable if it runs on other architectures. Write once, run anywhere, perhaps many times.

CUDA already runs on multi-core CPUs in our emulation mode, for debugging, and this could become a higher-performance solution. Not nearly as high-performance as a GPU that is optimised for CUDA (of course!), but faster than more parallel CPU code runs today.