Intelligent manycore processors for data centers and embedded systems

Manycore processor

Kalray was founded in 2008 as a spin-off from the CEA (the French Department of Energy lab) one of the largest research labs in Europe, capitalizing on over 60 years of research and development on new architecture processors. The fabless semiconductor company is a pioneer in developing a new generation of extreme-computing, low-power and low-latency microprocessors. To better understand how Kalray’s processors will power data centres, autonomous vehicles, healthcare equipment and robots, we sat down with the vice president of its data centre unit Randy Skelley to find out more.

GPUs moved away from VLIW because of power and performance. Why are you going against the grain?

VLIW architecture have been well recognized for applications that targets compute intensive applications like signal processing, video processing or AI.

The VLIW is the only architecture answering to the requirements of Kalray’s targeted markets (Datacenter and embedded), enabling to combine the ease of programming of a general purpose CPU, the performance of a DSP and time predictability.

These are the features that have also motivated Xilinks to use a VLIW architecture to address the same markets with the ACAP platform.

MPPA architecture offers many advantages over GPUs:

  • A better ratio in terms of performance, price and power consumption
  • The ability to build complex systems based on a single processor whereas GPUs need to be combined with other resources (CPU, FPGA)
  • The ease of development: MPPA can run standard code, OS, tools and libraries

Can you expand more on your extreme computing, low power and low latency expectations? More specifically, What TFLOP count for what power on what workload?

Our upcoming Coolidge processor based on Kalray’s Massively Parallel Processor Array (MPPA®) architecture is a breakthrough in the industry, with performance that is 3X-8X its nearest rivals. Kalray next generation chips will deliver up to 25 TOPS / 6 TFLOPS. This performances scale perfectly with the number of chip used in the application thanks to the unique MPPA architecture.

The Coolidge processor will have very low power consumption (5W-15W), which allows integration into confined systems as well as for massive deployments in datacenters and low latency, which allows data analysis on the fly with a great level of integrated safety. For storage applications, the MPPA processors achieve a 3-µs latency for a 4KB IO processing, using a low latency RDMA stack.

Are you going to sell the chips or just the IP à la ARM? Do you have any licensing plans?

Our economic model is based on selling processors first and foremost, but also complete solutions developed around these processors: boards, software and hardware development platforms and tools, especially for artificial intelligence, to help clients develop their own products. Nevertheless, selling technology licenses as part of strategic partnerships could be considered in the future.

You mentioned full programmability when referring to the Kalray’s Massively Parallel Processor Array (MPPA) architecture. Is that an extension of the FPGA paradigm or something totally new?

The Kalray solutions are based on open-software development environments. This is a significant asset compared to competitors’ solutions, which use proprietary languages. Indeed, proprietary language raises several issues: the need for programmers with that specific skillset and the induced lack of freedom and flexibility. The MPPA technology allows clients full programmability based on the standard C/C++ development environment and standard OS from open source / Kalray’s third parties.

What do you see as the biggest area of growth for Kalray and what are your biggest challenges?

We are focused on two priority markets where the need for real-time performance and programmability can be addressed with one single processor: the data center market, namely intelligent storage; and the market of next-generation vehicles, i.e. intelligent cars. These are high-potential growth markets that have experienced a technological breakthrough and are being infiltrated by intelligence more and more. They will be worth over €1 billion each in 2021. 

Our biggest challenge is to pursue our technological roadmap and commercial deployment all at once, but our IPO has given us the means to do that.

Tell us more about your next generation Coolidge processors. How can you achieve these levels of performance and what prevents the competition from emulating what you're doing?

For the third-generation MPPA processor, the outcome of nearly 10 years of R&D, we have opted for an 80-core basic chip (as opposed to 288 at present). A very interesting aspect of this processor architecture lies in the possibility to assemble more dies alongside one another in one package to increase the overall performance: Kalray can fit several dies in the same package, providing performance from 80 to 160 cores per processor, and adapt power consumption to meet any market requirements. This is our main competitive advantage. Whereas our competitors are upgrading and retrofitting their products to address the needs of intelligent systems and artificial intelligence, our architecture was specifically designed for that – to complete up to 80 critical tasks in parallel.

Randy Skelley, VP of the Data Center Unit at Kalray

Randy Skelley

Randy has over 30 years of sales and executive management experience in the data storage, networking, and security industries. He has held various global and regional OEM, Cloud/xSP, and Channel Sales leadership positions with both large and small US-based companies, including QLogic, Quantum, Chelsio, ATTO, and Clearpath Networks. He is a seasoned sales and management veteran in multiple channels of the data storage, networking, automotive, and telecommunications markets.