How to enhance the power of the GPU

GPU compute moves from the periphery to key player in meeting performance demands

GPU compute can become key player in meeting performance demands

Graphics processing units (GPU) have traditionally been used to draw pixels on screens. But recently, the GPU has been found to be ideal for computing intensive tasks such as CAD, CAM, 3D modelling, image manipulation, matrix manipulation, Furrier transform and Monte Carlo simulations.

GPU accelerators are an easy way to harness the considerable floating point performance present in modern GPUs. These accelerators connect to the system via the PCI-Express bus, allowing users to run multiple accelerators within a single system.

GPU accelerators have become very popular with the high-performance computing (HPC) community, providing the majority of floating point performance in many clusters. And this trend is likely to grow.

Accelerators will continue to play an important role in providing the optimal levels of floating point performance within a given footprint. However, in the future a significant percentage of the compute power once found in discrete accelerators will end up on the same piece of silicon as the CPU, making the GPU an integral part of the total compute power available to the user.

Accelerated Processing Units

Accelerated Processing Units (APU) combine the CPU with a programmable GPU that offers significant floating point performance to boost the speed of compute intensive workloads. With the GPU on the same piece of silicon as the CPU, some of the costs associated with PCI-Express based accelerators, such as bandwidth and power utilisation, are mitigated.

The APU gives rise to the notion of heterogeneous compute, where the GPU and CPU work in harmony. The APU is the physical manifestation of heterogeneous compute; however, combining the CPU and GPU silicon into one package is just the first stage of a true heterogeneous processor.

There needs to be an architecture framework that provides both the software and hardware framework for APUs to be defined as a single entity rather than a CPU and GPU. This is where Heterogeneous System Architecture (HSA) and the HSA Foundation come into play.

Heterogeneous System Architecture (HSA)

HSA and the HSA Foundation were founded to deliver new, improved user experiences through advances in computing architectures that deliver improvements across all four key vectors: improved power efficiency; improved performance; improved programmability; and broad portability across computing devices.

HSA, with its widespread industry support from companies like AMD, ARM, Imagination Technologies, Qualcomm, Texas Instruments and many others, not only elevates the GPU to a first-class citizen in the world of general purpose compute, it also brings a number of benefits for developers who want to harness its power.

For example, in the past the GPU needed the CPU to feed it jobs, adding an extra step in the process. Programmers had to explicitly dictate which parts of the code would be offloaded from the CPU to the GPU, which meant the programmer needed to have some knowledge of the underlying processor and which workloads are best suited to each. This takes time and advanced knowledge of complex processor architectures.

To alleviate this burden on developers, HSA deployed "heterogeneous queuing." This technology means that the GPU can initiate its own processes rather than relying on a CPU to tell it to do so.

Such technology is the first step toward a truly integrated APU, in which workloads will automatically execute on the processor that provides the best performance and energy efficiency characteristics.

Making life easier for devs

Another example is with GPU accelerators. In the past, these have been limited by the amount of memory the chip can access. This has resulted in developers having to design and implement complex scheduling algorithms in order to keep the GPUs fed with data.

With HSA-supporting APUs such as AMD's "Kaveri," the GPU has access to system memory, allowing for easy access to gigabytes of memory, removing the need for developers to create complex memory queuing algorithms.

Silicon level technologies will be a significant driving force in moving the GPU from the sidelines to an integral part of total compute performance, but it won't be the only force. Helping developers harness the power of the GPU through software and tools is critical.

Dedicated GPU programming languages such as OpenCL provide developers with the ultimate tool in accessing the GPU's compute power. However, developers may not have the resources to learn a new language and port existing code to OpenCL.

This is why considerable effort is being put into bringing GPU and HSA support to existing high-level languages. One such effort is Project Sumatra, which has been working with the OpenJDK project to simplify the process of accessing the GPU for compute.

Heterogeneous compute architectures such as HSA will allow developers easier access to the world of GPU compute, with their experiences automatically leading to further adoption of the technology.

By combining silicon, architectural frameworks and software, the days when a CPU or a GPU will be singled out as compute devices are nearing an end. Heterogeneous computation will blur the distinction between CPUs and GPUs, leaving developers easy access to supercomputer-levels of compute power to enable the next generation of user experiences.

Lawrence Latif is technical communications manager at AMD. He has extensive experience of enterprise IT, networking, system administration, software infrastructure and data analytics.