'AI and Games' is a crowdfunded YouTube series that explores research and applications of artificial intelligence in video games. You can support this work by visiting my Patreon page.
With each new generation of videogame hardware, the graphical quality of new titles continues to mature. Many older games are being revisited through re-releases and remasters or they're made available once again on modern hardware. While they can be rendered at higher framerates and resolutions they're fundamentally limited by the textures developed by artists for the game. So what if we can use AI to update the textures from older games so that they're larger, crisper and more detailed for running at higher resolutions?
Today we're taking a look at a process referred to as super-resolution: where you feed an image into a trained deep learning algorithm and it generates a version of that image at a higher resolution while maintaining the artists original intent. We'll take a look at how it works and the games that have benefitted from it, ranging from AAA titles such as the recent Mass Effect Legendary Edition, but also the modding communities that have pioneered this idea over the last couple of years.
Why Do We Need Texture Upscaling?
So let's start by explaining a little bit of how graphics works in games and why super-resolution for textures can prove valuable. In two-dimensional games, objects are put together courtesy of pixel art that is used to represent the player character, the background environment and everything else in between. Moving characters will have a sprite atlas, that allows you to swap out the active sprite at runtime to convey a change in behaviour. However, for three-dimensional games, objects are now comprised of multiple elements. The two key elements are the model and its textures.
Any object in a 3D game, be it a character, prop or environment is effectively blank upon creation. The model is sculpted by a 3D artist and then it is textured to give it detail. But there's more than one texture applied to any given object. While there is the main detail, typically referred to as the diffuse texture, there are additional textures used to help render that object in different light conditions. This includes your normal map for adding vertex details, a specular map for controlling colour and an emissive for non-cascading glows. This process has evolved in the past 10 years or so given the adoption of Physical Based Rendering or PBR in the likes of Unreal Engine. But while the workflow has slightly changed, the practice of applying multiple textures for surface and then details to a model is the norm.
Now one of the big differences between 2D and 3D games is that 2D games largely hold up over time given the aesthetic of the sprite art. Plus those games are designed to be rendered at a fixed resolution, you can't really change the resolution without skewering the art. So when they're brought over to modern platforms, there is typically some form of scaling going on that helps retain the original aspect ratio, even if it begins to look a little chunky on modern screens and there are options for smoothing it out. However, this pixelated chunkiness (yes that's the scientific term for it) is exacerbated when you try to run older 3D games at higher resolutions. Rendering a 3D game at a higher resolution will result in the models looking sharper and ultimately presenting a more crisp image, but the textures used on those objects begin to look really bad. They look stretched out and heavily pixelated. Why? Because those textures were designed to support the game at the target resolutions for when the game launched. Nobody working on DOOM back in 1993 figured that people would be playing that game 30 years later on a 32-inch 4K widescreen gaming monitor. It was built to support the graphics and memory capabilities of the hardware at that time and the monitors it would have rendered on. Even games from the last 20 years dating back to the Nintendo Gamecube, Playstation 2 and original Xbox suffer the same fate given the limitations of the hardware.
More modern games have been able to work around this, given artists will typically build the original textures at a higher resolution than the target hardware. Hence a game receives a high-resolution texture pack as a DLC update or patch on Playstation 5 and Xbox Series X, because while it shipped with 1080p textures on the Playstation 4 and Xbox One, those were already downsampled from the original 4K textures made during development. Downsampling textures is a common practice, especially for multi-platform games targeting different resolutions and memory budgets. You are making a smaller texture that is less sharp an image but retains all of the core detail of the original while minimising pixelation or artefacts. While this is a relatively straightforward process in modern game engines and related tools, you can't go the other way: you can't make an original texture bigger without it looking chunky or blocky. And that's where the AI comes in.
AI upscaling attempts to reproduce the original image at a higher resolution, while minimising the pixelation and artefacts. This is achieved using a machine learning model that understands the underlying details of that image and can refine it as it is made bigger. Now it can't *add* information that isn't already there, so for example blurred illegible text written on signs can't be made legible, given it doesn't know what it said to begin with. But it can remove a lot of noise and grain from an image such that it is sharper and less pixelated. This can be applied either to sprites and art for classic 2D games or all of the different texture layers used in 3D games. And we're going to be focused on 3D games for the remainder of this video. In conjunction with all of this, there is ongoing work in texture synthesis: where AI is being used to figure out how an existing texture would have been drawn if it had a bigger canvas to work with, but that's a topic for another time.
Texture upscaling is slowly becoming an industry in and of itself, with many companies such as Topaz Labs now selling their own software for super-resolution, denoising and sharpening images and even videos - given they are after all a series of static images creating the illusion of movement. Meanwhile, Nvidia has their own NGX development tools for upscaling and more that are designed to run on an RTX-capable GPU and Adobe has recently integrated a super-resolution feature into their Camera Raw program. Plus as we'll see in a moment, many of the tools to start building your own super-resolution AI are freely available online and as such, many modders have started applying it to their favourite games.
Texture Upscaling vs DLSS
Now all of this sounds pretty exciting, but it also sounds very similar to what is known as DLSS - Deep Learning Super Sampling. DLSS is an upscaling technology developed by Nvidia that runs on their RTX graphics cards. While similar, there are some differences and how it executes and it's worth clarifying what those are.
Deep Learning Super Sampling essentially achieves both upscaling of the image as well as some anti-aliasing, but it's upscaling from the output of the systems graphical processing unit or GPU. DLSS is designed with the intent of allowing the graphics card to render the game while you're playing it at a lower resolution than normal, meaning it uses fewer resources, and then the AI part upscales the image before it makes it to your screen. So you might have your GPU render the game at 1080p and then the DLSS upscales that image to 4K to appear on your monitor. This is because the DLSS has a model trained for that game that knows how it should look at a higher resolution.
However, texture upscaling is all done in advance: you uprez the textures in the game during development, and then use it to replace the existing texture assets stored in the game engine. Naturally, this now means an increase in required GPU power and memory, given you're now rendering higher resolution images, but none of the upscaling processes are happening at runtime. It was all completed long before the player ever got their hands on the game.
If you are interested in finding out more about DLSS, check out our future AI and Games article that compliments this one as we go into detail on DLSS and how that works as well.
How Does Texture Upscaling Work?
Okay, so let's get into the weeds: how does texture upscaling actually work? It's reliant on deep learning: a process that uses deep convolutional neural networks that are trained to upscale the image. But more specifically, it's reliant on a technique called Generative Adversarial Networks and there's not just one, but two networks at play. One network is attempting to upscale the image to a higher quality, while the second acts as a critic, assessing how good the images are and determines where they are fake or not. If they are deemed fake, then they are discarded. This process of generator and discriminator is critical to the process: both networks need to learn about the images they're dealing with. The discriminator needs to be able to identify images that evoke specific properties, while the generator needs to create new images that retain those artistic values sufficiently such that they can fool the discriminator.
Now many of the examples we see throughout this video, are driven by a particular type of Generative Adversarial Network: an ESRGAN - or Enhanced Super-Resolution Generative Adversarial Network. The ESRGAN generator is powered by a convolutional neural network, using convolution layers to capture information about the original low-resolution image. As that image is passed in, it's capturing what is known as the 'feature space'. This is a collection of properties used to describe specific patterns in the image. Whether it's fur, or brick or any other common property or trait of that image. The convolutional layers are being used to process all of that information and store it such that when the super-resolution image is made, that new image will still retain that same feature space. So with this information captured, the generator begins to upscale the image to create the super-resolution image.
Now typically, a GAN's discriminator is interested in detecting fake input: that an image from the generator is not real and an attempted forgery of the training set. Instead, the ESRGAN uses what is referred to as a relativistic discriminator: meaning it assesses whether an image could be considered more realistic than the other one, rather than whether the image is fake. This is a small but critical distinction, given it's more interested in assessing the difference between the real input and fake output. This actually helps the ESRGAN learn more efficiently, given it can better differentiate the key distinctions between the original image and the fake one and focus on reproducing those in the final image.
The modified discriminator combined with changes to the generator GAN structure lead to one of the big benefits of ESRGAN compared to other Super Resolution GANs: it does a better job of retaining sharpness and detail in textures. A lot of existing methods would suffer from the significant blurring of things like fur, whiskers and hair or would lose detail in elements such as brickwork or tiling. These changes made a huge improvement to the overall performance.
Now while this technique works really well, it's still not perfect. A lot of super-resolution really benefits from high-quality images to begin with. So upscaling from 1080p to 4K is often a lot easier than grabbing much lower resolution images to start with. These will often result in artefacts that impact the final high-resolution image, an issue that impacts many of the projects we discuss later in the video.
All of this is of course just a high-level summary, and a list of resources on the intricacies of ESRGANs are available below for our more scholarly viewers. Now, let's start looking at the impact this is all beginning to have on the games industry.
Texture Upscaling in Modding Communities
The boom in super-resolution has only really kicked off in the last 3 years, given the ESRGAN that we just explored was only first published back in 2018. But it's already having a huge impact in games, and much of this stems from modding communities. Many modding communities have long-established practices of providing revised texture packs for beloved games, and in many instances, there are groups of people working on new textures to replace existing ones, such as the new New Vision 1.5 mod for Deux Ex. But now there is a new wave of super-resolution mods being released where creators have ripped the original textures out of the game and passed them through the deep learning process to great effect. There are many notable examples out there such as Deus Ex New Vision 2.0 and we're going to take a look at some in a little more detail:
DOOM Neural Upscale 2x
This mod is free to download and run with any GZDOOM installation. This made a splash in early 2018 and is reliant on Nvidia's GameWorks tools, which are a predecessor to their current Nvidia NGX, plus they used the Topaz Labs upscaler I mentioned earlier. What makes this set of DOOM textures really interesting, is that as you can see from watching the footage, the resolution has not been scaled up as high as you might expect. In fact, as the name implies, the textures here are only 2x their original size and a big reason for that was to retain their artistic integrity and clarity.
During development, they were originally upscaled all the way up to 8 times their original size, but as a result of the super-resolution process, some artefacts started to appear in the final textures and sprites. A big reason for that is that the original images are really small. As mentioned earlier, super-resolution is reliant on whatever information is already in the image