The new AMD Navi graphics cards will be here today and will also be the first to house the new RDNA GPU architecture. The AMD RX 5700 XT and Radeon RX 5700 cards will represent the vanguard of this new generation, and will be the first since 2012 to replace the Graphics Core Next (GCN) design, when they launch at the start of July.
But what is in this new Radeon DNA architecture that makes it so distinct from the previous generations of GCN silicon that’s been making up the red team’s GPUs for the past seven years? Dr. Lisa Su has told us, on multiple occasions now, that the RDNA architecture is a completely new, ground up design, on a par with the way its CPU designers started afresh with Zen post-Bulldozer et al.
Thanks to the full editor’s day info drop just before the E3 Next Horizon Gaming event in LA we’ve got the full low-down on just what makes this new Navi RDNA architecture tick, and what we can expect to see in the Navi 10 GPU. And the first thing to say is that, arguably, it’s not quite a ground-up redesign, this time around at least…
There are still some great parts to the ol’ GCN design – it’s a fantastic compute engine, for one – so throwing the silicon baby out with the bathwater seems a bit counter-intuitive. Therefore AMD has mostly created this first iteration of the RDNA architecture using GCN building blocks. But the main thing to note is that this is a pseudo-hybrid of the Graphics Core Next design that has been reworked specifically with gaming as its main focus.
That means GCN isn’t dead, but AMD looks to be forking its graphics efforts into one architecture for its compute-based cards, such as the Radeon Instinct range, and a separate RDNA-based design focusing on gaming as its true raison d’etre.
AMD Navi RDNA release date
The first Navi-based RDNA graphics processors will be the Navi 10 GPUs being baked into both the Radeon RX 5700 XT and RX 5700 getting released on July 7 2019. Interesting side-note; that’s the same day AMD is launching its similarly 7nm Ryzen 3000 processors too.
AMD Navi RDNA specs
That 7nm Navi 10 GPU is the first RDNA-based graphics chip, and the full core – used by the RX 5700 XT – comes with 40 compute units, and therefore 2,560 cores across two shader engines, with 64 render output units, and 160 texture units. It’s 251mm2 and has a full 10.3bn transistors inside it. Navi has a GDDR6 memory controller, though is compatible with HBM if needed, and has PCIe 4.0 support too.
AMD Navi RDNA architecture
There are a lot of similarities between GCN and RDNA, but the big focus for the RDNA architecture is on reduced latency and improved efficiency specifically for gaming. That means Navi with RDNA is able to do more instructions on each clock cycle, while using less actual silicon. It’s also got a streamlined graphics pipeline and a much-improved cache structure too.
AMD Navi RDNA performance
AMD is touting a 25% increase on a performance per clock basis compared with the previous purestrain GCN architecture. AMD has also shown a 50% overall performance gain using the same power, and same configuration, on The Division 2 at 1440p between the RX Vega 64 and the full Navi 10 GPU, presumably in an RX 5700 XT. It’s also claiming a 1.5x increase in terms of performance per Watt compared with the Vega 64 card too. Poor Vega…
The new RX 5700 XT and RX 5700 graphics cards will launch with the Navi 10 GPU, in two different configurations, on July 7 this year. That’s not far away now, people. This will be the first implementation of the RDNA graphics architecture, but definitely won’t be the last.
David Wang, senior vp of the Radeon Technologies Group, has explicitly stated that the second generation of RDNA will follow this one, probably next year. The gaming architecture roadmap slide indicates that the 7nm+ RDNA 2 design will arrive before 2021… so 2020, then.
There had already been speculation about a ‘big Navi’ GPU arriving next year, which would be designed to take on the best of Nvidia’s GeForce graphics cards, so the Navi 20 chip could well feature the RDNA 2 architecture.
The Navi 10 GPU is the first RDNA graphics processor to implement the new AMD architecture, and its complete specs are being brought to bear on the new Radeon RX 5700 XT card. The RX 5700 will feature a more cut-down version of the new Navi 10 GPU.
The full-fat chip features the TSMC 7nm process technology across 40 RDNA compute units (CUs) delivering 2,560 stream processors or RDNA cores, if you will. The Navi 10 GPU also comes with 64 render output units (or ROPS) and 160 texture units. It also houses four distinct asynchronous compute engines, 4MB L2, and 512kb of L1 cache.
In terms of the actual scale of the GPU you’re looking at a GPU which is just 251mm2. That’s a hell of a shrink when you consider that the 14nm Vega 10 GPU, by comparison, is some 495mm2. Or 486mm2. Oddly it depends who you ask, or which part of AMD’s own slides you look at… some parts of the latest documentation have one figure, and some have the other. Either way, Navi 10 is a lot smaller.
And yet it still packs in 10.3bn transistors, which is still pretty good when compared with the much larger Vega 10 chip’s 12.5bn transistors.
The biggest difference between the last GCN and new RDNA GPU designs, on the surface at least, are the changes AMD has been made to the compute unit. With Vega, AMD introduced the Next-Generation Compute Unit (NCU), but the RDNA Compute Unit steps in to rewrite the rules for a gaming-focused graphics processor.
You can pretty quickly see some of the difference between the two architectures from looking at the overlay above. The Vega design has the classic GCN-class individual compute units, housing the stream processors in four batches of eight CUs, across two shader engines. The Navi GPU layout, however, has squeezed the CUs together to create the Dual Compute Unit (DCU).
There are four distinct graphics arrays in the Navi 10 GPU, each with five DCUs inside them sharing resources, split across two shader engines, which all means there are a total of 40 individual CUs across the whole die. Having the compute units configured in pairs, in what AMD is calling a Workgroup processor, allows the two CUs to work together using shared resources to cut down latency while still offering greater parallelism.
If there’s a bit of Bulldozer PTSD creeping in, I understand. The idea of taking discrete parts – in the case of the Bulldozer CPUs, its cores – and lumping them together with shared resources can potentially boost parallelism but risks lessening their individual power.
Thankfully that’s not the case here; each CU is almost entirely distinct but they can share the shader instruction and scalar data caches, as well as the local data share section of the DCU, if they need to work together. In short, this dual compute unit layout shouldn’t impact a discrete CU’s performance.
In fact, the RDNA compute units in Navi actually have more dedicated units inside them compared with a traditional GCN compute unit. Each CU now has one extra scheduler and one extra scalar unit, doubling up their counts on both scores. And that’s specifically to offer double the potential instruction rate in order to create a more efficient compute unit dedicated to standard gaming graphics workloads.
This tailoring to gaming is most obvious in arguably the biggest architectural change of the RDNA architecture: boosted instructions per clock. We’re going to descend even further down into acronym canyon now, but bear with me.
The GCN architecture uses SIMD16 units, a unit that can process 16 instructions at a time. The SIMD nomenclature stands for ‘single instruction, multiple data’ meaning that there are multiple elements performing the same operation on different data points at any one time.
And the four SIMD16 units in an overall GCN execution unit are great at doing complex calculations for scientific applications, but what they not great at is operating particularly efficiently when it comes to gaming workloads. That’s because the four SIMD16 units work on a 4 cycle issue, meaning that they can’t process an instruction in a single clock cycle.
This is where RDNA’s new execution unit design comes into its own for gaming. Instead of four SIMD16 units RDNA uses a pair of SIMD32 with a single clock cycle issue, which allows it to not only deliver higher throughput but it also means more of the Navi GPU is being utilised at any one time, making it far more efficient for graphical demands too.
With Vega specifically there are often times where large parts of the GPU aren’t doing anything when processing game graphics because they’re waiting for an instruction to go through its four cycle process. That’s going to be much less of an issue for Navi.
It’s almost entirely analogous to the way that AMD has been approaching its new Zen 2 architecture. It knows that multi-threaded processing is vital – and even more so for the parallelism required by graphics processors – but really digging down to make your single-threaded performance as good, and as efficient as possible only goes to lift everything else.
AMD went wider with its Zen design, adding more and more cores, and now it’s trying to go faster by pushing the performance of those individual cores. And that’s exactly what the GPU side of the company is doing with the move from Vega’s GCN design to Navi’s RDNA architecture. And it means you don’t necessarily need as many cores to go as fast.
AMD has also improved the cache hierarchy for RDNA and the Navi GPUs too. There’s a new L1 cache design, with the same 16kb per-CU cache, but an extra 128kb of separate L1 cache for each of the four graphics arrays on the chip. Then you have a full 4MB of global L2 cache shared by everything.
This localised extra L1 cache means that the demands on the L2 cache are lessened, offering just that little bit of extra bandwidth within the RDNA system. In fact you’re looking at slightly more bandwidth for the Navi 10 GPU than the full Vega 10 silicon, despite the fact there are only 40 CUs in Navi and 64 in Vega. On a per-CU level, then, Navi’s RDNA architecture offers it significantly increased bandwidth.
And that is quite exciting for what the big Navi core might end up delivering via the Navi 20 and RDNA 2 7nm+ design coming next year.
There are other underlying architectural improvements AMD has made with Navi’s RDNA design, which all factor into the improved gaming performance. It has also streamlined its graphics pipeline to deliver more performance per clock, for example, and has improved the colour compression algorithms, allowing more parts of the chip to actually read and write to that compressed colour data.
But the biggest thing for Navi is that laser focus on ensuring the highest IPC for specific graphics workloads, and having a dedicated gaming GPU architecture – as opposed to one that’s trying to be all things to all people – could really pay off for AMD in the long run.
With all the extra engineering efforts AMD has put into its new RDNA architecture it’s that single-threaded performance that is going to see the greatest improvement in performance. That’s not something we traditionally talk about with regards to graphics silicon, more when we’re chatting CPU tech, but how fast the individual components of a GPU chew through their workloads is still vital for improving the overall multithreaded parallelism of a graphics card.
To that end, AMD is claiming a 25% increase in instructions per clock for the RDNA architecture versus GCN. It’s also promising a 50% overall performance gain when utilising the same power and the same general configuration too.
Both of those numbers have been pulled from a machine running The Division 2 at 1440p, so they’re definitely gaming performance figures and not just some synthetic test. Yes, Intel, real-world performance metrics… The GCN numbers come from a Vega 64 and the RDNA figures from the full Navi 10 GPU, presumably deep inside that crumpled cigarette packet looking Radeon RX 5700 XT.
The power side of the equation has also been improved, with AMD’s numbers showing a 1.5x increase in performance per Watt compared with the Vega 64, and that’s with this version of Navi using the more thirsty GDDR6 than HBM2.
In terms of straight graphics card vs graphics card performance, the full Navi 10 GPU has found a home in the RX 5700 XT and AMD has published its own performance figures for it up against the original Nvidia RTX 2070.
And in those terms it’s looking like the Radeon card has the edge across a pretty wide array of games, and notably not just those ones which have a traditional AMD performance bias either. It’s seemingly only by single digit percentage points in most cases, but there are a few outliers where it’s much faster and a few were it’s slightly slower.
We’ll see how it all shakes out via our own independent testing soon, as the new RDNA architecture will be here within the new Navi graphics cards on July 7… and hopefully in our test bench well before then too.