AMD’s next generation of graphics architecture is coalescing before our very eyes. A freshly unearthed patent application, published in mid-December 2018, shows a new design for AMD’s post-GCN, high-bandwidth, low-power, stream processors. And there’s a heavy emphasis on improving the parallel processing compute power of its next-gen GPUs and increasing their efficiency at the same time. And it has a faint whiff of Nvidia’s SM design about it too.
This isn’t the AMD Navi architecture, however – that is reportedly the last spin of the Graphics Core Next design introduced in 2012 – this is a new take on the stream processor for the GPU architecture to follow it, potentially in 2020. There has been speculation that AMD Arcturus would be the 7nm+ design on its current GPU roadmap, suggesting a 2020 launch for this whole new graphics processor design.
Now, you’re going to have to bear with me through this as I’m a relative dunce when it comes to the architectural specifics of actually engineering a new graphics core. So there is going to be some rather speculative assessments based on what we can glean from the dense technical language the patent application is couched in. This is where I wish I’d spent more time concentrating at school…
Anyways, the patent application has come to light via serial tweeter and leaker Komachi Ensaka, with the application titled ‘Stream processor with high bandwidth and low power vector register file’ and seemingly follows on from, and builds on, a design put forward in a previous application. The earlier patent was published in May last year titled: ‘Super single instruction multiple data (Super-SIMD) for graphics processing unit (GPU) computing.’
[AMD] STREAM PROCESSOR WITH HIGH BANDWIDTH AND LOW POWER VECTOR REGISTER FILE https://t.co/K6sHm992Yn
— 比屋定さんの戯れ言@Komachi (@KOMACHI_ENSAKA) January 21, 2019
The standard SIMD in the current Graphics Core Next architecture simply contains 16 arithmetic logic units (ALUs) and each compute unit (CU) has four of these SIMDs inside it. This is is essentially what gives us the 64 ‘cores’ that we talk about when we say the Radeon VII has 3,840 GCN cores in it, for example. In the GCN architecture the compute unit is the smallest, fully independent, unit in the GPU.
Read more: The best graphics cards to buy today
The compute unit has lots of shared resources inside it, such as schedulers and caching systems, which all of the individual SIMDs can use. Though obviously these resources can’t all be used at once so the CU has to decide when instructions within each SIMD get processed. This can inevitably lead to bottlenecks in the GPU, and this is what the latest patent application is looking to get around.
The new high-bandwidth stream processors seem to have far more logic packed inside them than just the old GCN-style of simple ALUs. The patent shows each stream processor looking more like a GCN compute unit of old, with each of them housing their own instruction queues, cache and buffers. This could result in each ‘core’ then becoming the smallest independently functioning part of the GPU as they will be more capable of carrying out tasks without having to wait to use the shared resources built into the standard compute unit.
The previous application has a diagram of what the updated compute unit design would look like when housing four of the more complex stream processors, which can then farm completed tasks out to the scheduler and shared cache of the next-gen CU.
This may not necessarily allow AMD to add more stream processors or ‘cores’ into its GPU designs, but it will mean that each one is far more capable than the last gen. Essentially this should mean that, with the new cores less likely to be sat idle waiting for shared resources to become available, the next-gen GPU will be able to carry out more parallel processing tasks – more compute tasks – per clock cycle.
That said, the patent does state that while one embodiment of the stream processor design contains 16 ALUs in the overall layout – as with the current GCN model – other embodiments contain different numbers of ALUs. You could then either have higher power designs with more ALUs inside them, or more efficient, highly parallelised low-power designs with fewer inside.
With AMD’s graphics architecture already heavily compute-focused anyway, the next-gen Arcturus (maybe) design could end up being a monster on that front. And with that much complex silicon inside each stream processor in the compute unit – not a million miles away from the streaming multiprocessor (SM) design Nvidia has been using to pack out its own GPUs with – there’s the potential for not only the WinML promise of a DLSS-like feature, but genuine DXR support could also find its way into the 2020 AMD architecture.
The flip-side of the more complex stream processors is that they should also represent a lower power system too. It is designed to bypass certain buffers and avoid the duplicated use of resources, and has a cache recycling system which means it doesn’t need to re-fetch data the stream processor needs to work on again.
The parallels between Nvidia’s existing streaming multiprocessor design and this potential new AMD stream processor aren’t hard to parse. By putting more logic into the smallest parts of its GPUs AMD is going to allow for more finegrain control, hence the power saving, and more parallel processing, potentially boosting per clock performance.
And, as the dominant GPU technology of today, most systems are optimised for Nvidia’s design. By creating a graphics architecture that can leverage all those existing optimisations, but adding in its own AMD spin on things, the new Radeon chips could be a real challenge to Nvidia.
This should all play into making AMD’s next-gen GPU architecture at once more parallel – therefore potentially more powerful – and also more efficient too, which is something Radeon fans have been crying out for.
This is still all very speculative right now, and there’s no specific hint that this will in fact come in the next-gen 2020 GPU, whether or not it’s codenamed Arcturus. But the timing makes sense – the GCN architecture is getting rather long in the tooth now, and was designed at a time when 28nm GPUs were all the rage.
Now we’re talking about lithographies far smaller there is now the potential to put more logic into the building blocks of our graphics chips, while still being able to jam enough of them into the package to make them powerful, without being unfeasibly large, difficult to manufacture, and unbelievably expensive.