Nvidia offer a glimpse into the future with a multi-chip GPU sporting 32,768 CUDA cores

Nvidia Multi-chip Module Diagram

Nvidia researchers see multi-chip GPU designs as the future of high-performance graphics cards. Could we be looking at a new breed of multi-GPU GeForce cards?

Current graphics tech ranges from the unassuming to the ridiculous, but which of the many options should you go for? Check out our guide to the best graphics cards out there.

Nvidia have released a joint research paper, with Arizona State University, the University of Texas, and the Barcelona Supercomputing Center, looking into how they’d stuff multiple discrete GPU modules into the same chip package.

With the transistor count on modern-day GPUs climbing constantly, Nvidia reckon the performance of traditional graphics processors will plateau if manufacturers continue to use the current single-chip designs. It’s getting more complex and more expensive to fit a greater number of transistors onto a single die, leaving Nvidia with the dilemma of how to keep increasing graphics performance in line with demand.

Nvidia Tesla Volta V100 GPU

Stuffing more transistors into ever smaller spaces has increased the complexity of the chips and also reduced the die yield through manufacturing faults, meaning GPU production is an increasingly costly process. The cost of researching and manufacturing the lithography shrinks needed to bring the transistor scale down is continually growing too. So what’s a massive GPU manufacturer to do?

Nvidia’s joint research project hypothesizes that designing a multi-chip module (MCM) might be the way forward. Such MCMs would work by connecting multiple smaller GPU modules (GPMs) together, using advanced input/output tech to communicate effectively with each other. The GPMs themselves would be less complex and therefore easier to produce, and having several of them working together would result in speedier graphics cards.

In Nvidia’s GPU simulator, the team constructed two virtual GPUs with 256 streaming multiprocessors (SMs) in each: one based on the current monolithic design and a second based on the MCM design. The multi-chip GPU (which is theoretically possible) performed within 10% of the speed of the massive 256 SM monolithic GPU (which definitely isn’t possible). They claim to have based the simulated GPUs on the current Pascal architecture, which would mean the virtual designs contained 32,768 CUDA cores. Yum.

Nvidia Multi-chip Module Performance Simulation

The simulation shows that by using a high-speed interconnect they can maintain performance very close to what you would get with a massive single-chip setup. The joint project also simulated how the MCM-GPU would fair against a similarly specced SLI array and found the new design would be over 25% quicker.

It’s worth mentioning that we’re talking about huge hypothetical numbers here – to put them in perspective, Nvidia’s current poster child, the GTX 1080 Ti, is ‘only’ rocking 28 SMs. A single-chip design sporting the 256 SMs quoted above is actually such an unwieldy spec that Nvidia have described it as unbuildable.

The current biggest GPU is Nvidia’s own Tesla V100 – a card built on the Nvidia Volta architecture that uses a miniscule 12nm FinFET production process. Nvidia CEO, Jen-Hsun Huang, claims it was created “at the limits of photolithography” and that you “can’t make a chip any bigger than this because the transistors would fall on the ground.”

Intel Heterogeneous CPU design

Meanwhile, Intel and AMD have also been looking into multi-chip tech. Back in March, Intel announced plans to build CPUs out of multiple chips from different generations stuck together, using a technology they call embedded multi-die interconnect bridge (EMIB).

AMD’s much more sexily named Infinity Fabric is the interconnect that allows the different quad-core Zen modules in their octa-core Ryzen processors to communicate at high speed with minimal latency. But they’ve also suggested that using the speedy Infinity Fabric would allow them to plumb multiple AMD graphics chips together in one package.

“Infinity Fabric allows us to join different engines together on a die much easier than before,” AMD’s Raja Koduri explains. “As well it enables some really low latency and high-bandwidth interconnects. This is important to tie together our different IPs (and partner IPs) together efficiently and quickly. It forms the basis of all of our future ASIC designs. We haven’t mentioned any multi GPU designs on a single ASIC like Epyc, but the capability is possible with Infinity Fabric.”

Implementing this multi-chip module setup in an effective way won’t exactly be an easy feat but, so long as they can maintain effective scaling with successive GPUs, it will be worth it. Historically, multi-GPU SLI and CrossFire setups have squeezed less and less extra performance out of each added GPU. For example, if you had a quad-GPU setup, you’d barely be getting any extra grunt from the final processor.

Nvidia's prototype VR machine

The driver support will also have to be top notch for this MCM setup to work effectively. The research paper is focused on supercomputing mathematics, but if they’re looking to bring this tech down into their consumer GPUs the design will need to be almost invisible. The graphics APIs will need to be able to regard an MCM-GPU in the same way as a traditional one so developers don’t have to change their methods to take advantage of the extra pulling power.

Of course, when results are based purely on simulations, it’s best not to hold your breath. It’s certainly very early days for this technology but it sounds like this is the direction the main manufacturers are moving in for the tech of tomorrow.

Thanks Tech Report.