This new tech from Intel Labs could revolutionise VR gaming

If I say ‘good graphics’ you probably envision the latest triple-A game with high-res textures and ray tracing. If I say ‘photorealistic’ you probably think of something similar, but with extra attention paid to the similarity between the graphics and the real world. New research from Intel Labs, though, shows us just what photorealistic should mean – and has some thrilling applications for game development, especially in VR.

This new technique is called Free View Synthesis. It allows you to take some source images from an environment (from a video recorded while walking through a forest, for example), and then reconstruct and render the environment depicted in these images in full ‘photorealistic’ 3D. You can then have a ‘target view’ (ie a virtual camera, or perspective like that of the player in a videogame) travel through this environment freely, yielding new photorealistic views. This research will be presented online at ECCV 20 (August 23 to 28), so to hear more about it, make sure you tune in!

The (long-term) implications of this technique for game development should be obvious. In theory, with pictures of a real-world place to draw from, it should be able to automatically produce a traversable videogame environment that is not only identical in both layout and content, but also in looks. Quick and effortless level design, comparable to reality in terms of visual fidelity.

We spoke to Vladen Koltun, Chief Scientist for Intelligent Systems at Intel Labs, about this new technique. It has plenty of applications, but regarding gaming specifically, Vladen says “it can create games that are indistinguishable from reality”.

For the nitty-gritty on how it does this, you can read the full Free View Synthesis research paper [PDF], but essentially, the environment that’s taken from the source images is reconstructed using Structure From Motion (SfM) and Multi-View Stereo (MVS) to create a proxy depth map, then the target view has features reprojected and processed “via a convolutional network” to synthesise the new view. In short: take the source image, create a ‘depth map’ environment from these images, and then use a convolutional network to synthesise a new view in this environment.

“There is some geometric computation of the kind that you would encounter in classic graphics pipelines, and then a pass through a convolutional network, as you would see in standard deep network inference.” So we have deep learning being put to the task of creating photorealistic views in true-to-life scenes with only a limited number of source images (taken freely in the environment) for reference.

The video above shows other techniques that attempt to do a similar thing, including NPBG, a technique “published contemporaneously at the same conference” that “arose independently” from a different research lab and that has a lot in common with Free View Synthesis. As you can see in the video, though, Free View Synthesis achieves slightly better-looking results.

The use of SfM and MVS is very important for Free View Synthesis, Vladen tells us, and “many other techniques do not use SfM/MVS to the extent we do”. The researchers think that “it would be wasteful to just sweep aside all the amazing progress that has been made (and continues to be made) in SfM/MVS. Rather, we should build on top of it. The high level of fidelity attained by our approach is due, in significant part, to thirty years of progress on SfM/MVS.”

Image taken from the video above, showing an environment view generated by Free View Synthesis.

In theory, by using Free View Synthesis, a game developer could record a video strolling through a local deserted park, throw images from this video into the Free View Synthesis pipeline, and be able to move a virtual camera through a true-to-life, photorealistic, 3D reconstruction of the park, taking any route of movement they so desire. Extrapolate from this the implications for gaming, and you have a technique that could allow for easy and quick creation of photorealistic VR environments that the player can walk around in. And while the “current implementation is not optimised for real-time performance… there is no fundamental roadblock to doing this in real-time”.

Deep learning graphics capability is needed for Free View Synthesis, but this shouldn’t be a problem considering Nvidia’s latest GPUs have such capability (for DLSS, for example), and AMD should soon follow suit. But the real challenge to tackle – at least for gaming – is the creation of new synthetic elements in scenes. It’s all gravy being able to walk around a photorealistic park, but to make a game you need to be able to interact with the environment, or at least introduce other elements into it (like guns for an FPS game, for example).

Image taken from the video above, showing an environment view generated by Free View Synthesis.

How the technique could deal with such a thing, Vladen says, “is an open-research question”. In other words, it hasn’t yet been answered – which is not to say that it won’t be. “In its present form, the approach only handles static scenes. It allows you to freely move your viewpoint through a static scene while retaining a photographic appearance. Eventually, such techniques will be extended to dynamic scenes, with objects moving around, but we’re not there yet. This is an active research topic.”

This is probably why the technique “won’t be available right away” for gaming – “not even in the next couple of years, but eventually”. But it’s an enchanting look at what’s likely to come.

We already have some of the best VR headsets working to completely immerse you in a game world, and some of the best graphics cards using AI and deep learning to efficiently create high fidelity game environments. Rendering techniques like Free View Synthesis, when combined with these hardware advancements, give us the promise of great technological leaps for gaming, and that’s something I think we can all get behind.