A closer look at DeepMind, the Google AI that’s mastering StarCraft 2

May 31, 2020 This feature was originally published in January 2019.

You’d be forgiven for assuming that DeepMind’s artificial intelligence technology has already proven its chops.

Back in 2016 the celebrated computer lab watched one of its AI programs do the unthinkable and win a game of Go against then world champion – and human being – Lee Sedol. Mastering the ancient Chinese board game was just one example of the machine learning DeepMind is hoping it can ultimately use to revolutionise sectors like science, healthcare, and energy.

For the next step on that journey, DeepMind has turned its attention to StarCraft II. The seven-year-old RTS may still be an esports sensation, but it’s not an obvious step up from Go. After all – and with apologies to Blizzard – the 2,500-year-old abstract strategy game is considered to represent a pinnacle of game design, strategic depth, and elegant complexity. But the thing about Go – and that other great AI sparring partner, chess – is that it is precisely ordered and tightly structured. Despite the daunting combinations of possible moves these games offer, their depth is not necessarily complimented by breadth.

A multiplayer RTS, on the other hand, is a little more chaotic. The very best StarCraft II pro players can land upwards of 800 meaningful mouse and keyboard actions per minute. It’s a dynamic, erratic strategy game played at the speed of a bullet hell shmup, where myriad systems of interaction jostle in a bewildering tangle. StarCraft II demands that its players handle uncertainty and make sense of nuanced spatial environments. All of which presents quite the challenge for an AI.

As such, DeepMind has been building an AI program named AlphaStar, with one purpose in life: mastering StarCraft II’s competitive multiplayer. In fact, it’s already gone head-to-head with some of the world’s best players.

Which is why I find myself shuffling into an ad hoc TV studio set up in Google’s UK headquarters. In recent weeks DeepMind – which is owned by Google parent Alphabet – has sent AlphaStar to fight with Team Liquid’s esteemed pros Dario ‘TLO’ Wünsch and Grzegorz ‘MaNa’ Komincz. Now the company is ready to share the pre-recorded games, and commentators that have not seen the games before have been drafted in to bring some energy. A very slick stage is set.

What they are about to reveal feels important. This isn’t just about AI going up against a pair of esports teammates. AlphaStar is challenging the notion of what skill means in gaming. The ramifications could shift how human pro gamers play, how future titles are developed, and, of course, how AI augments human capability in the wider world.

DeepMind began by building an artificial StarCraft II player with no sense of the game at all. Indeed, the first StarCraft II AI program – or ‘agent’ – they crafted couldn’t even comprehend a mouse and keyboard, let alone understand rules or strategies. But it kept plugging away, watching half a million human StarCraft games, learning all the time. AlphaStar imitated, experimented, failed, and learned. That’s the combined ‘deep learning’ and ‘reinforcement learning’ process at the heart of DeepMind’s offering.

By BlizzCon 2018 in early November, AlphaStar had grasped the RTS’s rules and mastered some basic macro-based strategies. By December 10, having played numerous games against different versions of itself, the AI had capably beaten the most accomplished human StarCraft player on the DeepMind team. It was time to up its game.

Nine days later Team Liquid manager TLO flew to the UK. As a pro StarCraft II player he has fielded all the game’s different races, but is known as an exceedingly capable Zerg player. AlphaStar, however, had focused on Protoss vs Protoss to keep its learning consistent. A Protoss-only match against TLO would therefore be a perfect, gentle first test – pitching DeepMind’s agent against an expert out of his comfort zone.

Driverless cars: This is how Codemasters teaches AI to drive

As the stream kicks off, AlphaStar beats TLO in the first match using a quietly unconventional play style – declining to wall off a choke point ramp, a well-established approach given the selected map. Out of the gate, AI has taken the lead over the humans, and doesn’t appear overly concerned with dancing to the tune of convention. For all the theorycraft that StarCraft players obsess over, AlphaStar is already doing things differently.

The DeepMind agent is constrained to be as human as possible, however. There are limits on just how fast it can interact, and concessions to make sure its approach isn’t so abstract games are rendered unplayable. Indeed, TLO made more ‘actions per minute’ than AlphaStar in their first clash, proving there’s little case to argue an unfair speed advantage prevails.

“I started that match very confidently,” TLO says of his defeat, sporting a confounded grin. Then Team Liquid player asserts that he’s learned from the experience, however, and feels ready to trounce AlphaStar in the next games.

Things go a little differently, though. With each round, AlphaStar switches its strategy. The AI is always relentless, always efficient – but never predictable. TLO is thrashed 5-0.

DeepMind clearly hadn’t held back. AlphaStar actually combines five different agents – something like a spread of different versions of itself. What’s more, DeepMind says that AlphaStar has played around 200 years of the game, a fact that TLO clearly took some solace in. But actually, that’s more comparable to human learning than you might think.

Those 200 years include everything every version of AlphaStar learned playing itself. Similarly, a StarCraft II player with 500 hours of gameplay behind them also inherits the collective learning of the players that went before them, and those they have fought with. We are all more than our own experience; hundreds of years more.

Regardless, it was time to up the challenge. MaNa is both a talent and a confident Protoss-focused player. Surely now AlphaStar would meet its match?

After five straight defeats at the hands of AlphaStar, MaNa appears to be in equal parts exasperated, delighted, and fascinated. Like TLO, he was clearly taken by surprise. While AlphaStar follows the rules of the game to the letter, it simply won’t respect the established game strategies that StarCraft II players have collectively developed.

Real-time talk: The best RTS games on PC

MaNa does, at least, achieve a saving grace for himself, as well as for the public regard of esports skill and perhaps humanity in general. DeepMind hosts a live game in the studio, streamed out warts and all. And it emerges that this is the first game against a pro where DeepMind’s AI has been locked to using the player camera. AlphaStar doesn’t see the game, as such, but has previously been able to comprehend an entire match area, rather than experience it through a camera view.

This time, MaNa wins, and is clearly tremendously relieved.

With the cameras switched off there’s a collective sense of enlightenment in DeepMind’s temporary studio. AlphaStar may have lost its last battle, but a final score of 11-1 in the AI’s favour has left minds reeling. And the pro players are feeling optimistically reflective.

TLO tells us that MaNa has already taken a tactic he saw AlphaStar deploy and used it in real-world games. If AlphaStar can keep shifting its play style unpredictably, perhaps humans can learn new approaches from it, just as AI learns by watching humans. We may see disruptive new game theories rise up in competitive esports that weren’t originally conceived by biological organisms.

Meanwhile, the DeepMind team is excitedly chatting about the implications of highly capable AI players on games development. They may ultimately hope to see their technology improve global supply chains, disaster relief, and the work of healthcare professionals, but for now the notion of human-level AIs playing games has everyone inspired. What might AlphaStar mean for game testing? Could its abilities not just assess a given game’s design validity, but actually feed into the creative process? Might AI manage to create perfectly balanced games, free from human interference?

Move over: Cannon rush the best strategy games on PC

Maybe, TLO reflects. But he points out a perfectly balanced game might not inherently be a good game. It’s minute imperfections in balancing, after all, that have allowed StarCraft II players to build such a dense library of game theory around their beloved RTS. That is where the capacity for individual flair and dramatic turns of fate comes from.

AI might be able to better us already, but that doesn’t mean perfection is perfect for gaming.