AI detecting player from vision using camera and shaders

In this post I will explain how I gave my AI a real vision using a camera and a compute shader to detect player with more accuracy than ray casts.

Kévin Drure, Blogger

November 5, 2018

12 Min Read

Introduction

In a lot of video games, raycasting is often an efficient solution to detect the player from an AI. But the player is often partially occluded, and then a simple raycast is not enough. A solution could be to perform several raycasts on the player bounds, but this is quite limited and can be tricky depending on the gameplay. Also, we often just have to know if the player is visible or not, but sometimes we need to know how much of the vision space the player visibility takes. An idea that has been given to me is to use a shader to detect the player, the AI perform the detection in a similar way the human player do: having a rendered image and trying to detect the target reading the pixels.

The subject I will explain today is a personal try, there is probably a better way to achieve this but I will try to explain a possible solution to achieve this. I will speak about Unity features I used to achieve this, but it is not Unity dependant, this idea is quite general and can be adapted to others game engines.

Giving vision to our AI

First, our AI needs a camera to get a rendered image. The idea is to perform 2 rendering: one for the environement, and the second for the player. Of course, we need to save our renderings into textures. At this step, comparing the 2 textures to detect the player pixels visible by our AI seems to be pretty difficult: that's because we missed a crucial point. We don't need to render human readable textures, with the same graphics as the human player see on the screen. We need to render textures having informations that can be interpreted by our code: the pixel depth. Instead of rendering the final pixel color for each pixel of the texture, we write the pixel depth for each pixel of the texture: that will be quite simple to compare the pixels depth of the 2 textures to determine if each of the player pixel is occluded or visible. Using Unity, you can ask a camera to render with a replacement shader. The idea is to use a custom shader to render objects instead of their original shaders. We have 2 methods for this: Camera.RenderWithShader and Camera.SetReplacementShader. For more info about replacement shader on Unity you can read the Unity manual or watch this pretty good video.

An example of pixel depth rendering

Comparing the render textures

To compare our 2 textures there will be some work to do. We could read the textures pixel by pixel in the game code, using the Unity Texture2D.GetPixels32 method but it would take too much time to read the textures using the CPU. As we are in a real-time program, milliseconds are the sinews of war, so CPU approach is not a solution. We have to do it using the GPU, because while the CPU executes our game logic, the GPU is fairly quiet, but also because the GPU loves parallelization. Using our GPU while the CPU is executing the game logic for the current frame is one thing, but using multithreading on the GPU (GPU are made to operate in this way) in addition is just heaven.

To achieve this we will use Compute Shaders. Compute shaders are shaders used to perform calculations on the GPU instead of the CPU. What we will need is of course a compute shader, and also a compute buffer. The compute buffer allow us to share data between CPU memory and GPU memory, so we can access to compute shader data using compute buffers. To learn more about compute shaders on Unity, a start point could be the Unity manual or this post if your are not familiar with compute shaders, or this one (very well explained but in french).

On the CPU side, we should have something like this:


    /// <summary>
    /// All visual detection update in a coroutine
    /// Perform rendering to textures, dispatch compute shader and use async request to get compute buffer when GPU finished
    /// </summary>
    /// <returns></returns>
    private IEnumerator CR_UpdateVisualDetection()
    {
        ComputeBuffer buffer = null;
        int[] computeBufferResult = null;
        AsyncGPUReadbackRequest request = new AsyncGPUReadbackRequest();

        do
        {
            // Check request done or failed to process to a new detection
            if (request.done || request.hasError)
            {
                // Request done with success
                if (request.done && !request.hasError)
                {
                    // Get compute shader function result from compute buffer
                    computeBufferResult = request.GetData<int>().ToArray();

                    // Release the compute buffer to be garbage collected
                    buffer.Release();

                    // Convert compute buffer result to a readable format (ratio of nb pixels filled / total nb pixels)
                    m_DetectionRatio = (float)computeBufferResult[0] / (TEXTURE_WIDTH * TEXTURE_HEIGHT);
                }

                // Check target is in frustum, if not the case no need to perform a visual detection, the target can't be seen at all
                if (IsTargetInFrustum())
                {
                    // Camera rendering to textures
                    if (m_VisualDetectionCamera != null)
                    {
                        for (int i = 0; i < 2; i++)
                        {
                            m_VisualDetectionCamera.Render(i);
                        }
                    }

                    // Create a compute buffer to share data between this script and the compute buffer
                    buffer = new ComputeBuffer(1, 4);
                    computeBufferResult = new int[1];
                    buffer.SetData(computeBufferResult);
                    m_ComputeShader.SetBuffer(m_KernelIndex, "intBuffer", buffer);

                    // Call compute shader function on 128*128 threads, as shader has [numthreads(32,32,1)] -> one thread per pixel on render texture
                    // 4 group of 32 threads * 4 group of 32 threads * 1 group of 1 thread
                    m_ComputeShader.Dispatch(m_KernelIndex, TEXTURE_WIDTH / 32, TEXTURE_HEIGHT / 32, 1);

                    // Perform an asynchronous request to get compute buffer on shader execution finished
                    request = AsyncGPUReadback.Request(buffer);
                }
            }

            yield return null;
        }
        while (enabled);
    }

Let's see what my compute shader does:


// Each #kernel tells which function to compile; you can have many kernels
#pragma kernel CountLowerDepth

// Textures to compare
Texture2D<float3> GlobalGeometryTexture;
Texture2D<float3> TargetGeometryTexture;

// Int buffer used to count target geometry texture lower depth pixels
RWStructuredBuffer<int> intBuffer;

// Count lower depth pixels on textures
[numthreads(32,32,1)]
void CountLowerDepth(uint3 id : SV_DispatchThreadID)
{
 // Translate conditional expressions in arithmetic expression
 int globalBlack = step(GlobalGeometryTexture[id.xy].r, 0.00001f); // GlobalGeometryTexture[id.xy].r == 0
 int targetNotBlack = step(0.00001f, TargetGeometryTexture[id.xy].r); // TargetGeometryTexture[id.xy].r > 0
 int lowerDepth = step(TargetGeometryTexture[id.xy].r, GlobalGeometryTexture[id.xy].r); // TargetGeometryTexture[id.xy].r < GlobalGeometryTexture[id.xy].r

 int caseA = step(2, globalBlack + targetNotBlack);
 int caseB = step(2 + caseA, targetNotBlack + lowerDepth);

 // Increment buffer int with greatest case result (++ if OR conidtion is true)
 InterlockedAdd(intBuffer[0], max(caseA, caseB));
}

The first thing to know is this compute shader is dispatched as one thread per pixel.Secondly, if we want to avoid branching, we have to deal with arithmetic, using here the step function. Step(y, x) returns 1 if x >= y, otherwise 0. Here, the idea is just to check if the pixel or the player texture has a lower depth than the pixel of the enviro texture. You will notice I checked the case we have a black pixel, meaning there is nothing to render at this pixel, in practice it doesn't happen, but for testing purpose you can have quite empty scene, and not testing this cases will return a wrong result.
You can aslo notice the InterlockedAdd function, I increment a int var in my buffer when a player pixel is visible, using a guarented atomic add function to avoid race condition.

Getting the compute shader result

Now we dispatched our compute shader using one thread per texture pixel, we need to get the result from CPU side. To do this, compute buffer class has a method GetData<T>(Array data), and that's all if you are using Unity 2017 or later. The problem with this method is you can't fetch data asynchronously. As the method work in a synchrnous way, the method causes your CPU to wait the shader finished and the memory to be accessible, causing a huge CPU stall (in my case it can takes 1 ms as 7 ms in some cases). Finally, the interest of using GPU is quite limited as the CPU has to wait the GPU, the parellelization is not very efficient.
The key is to use an asynchronous request, but if your are using Unity 2017, the unity API doesn't provide any method to perform this request in an async way. However, Unity 2018 provides a new feature called AsyncGPUReadback (note that only the 2018.2, the last version when I wrote this post, has this feature in the released API, for 2018.1 you have to search it in the experimental namespace of the Unity rendering API). The idea is to use a request and to wait for request done to get the data.

If we look at the previous code, there is only a few lines of code to manage this:


AsyncGPUReadbackRequest request = new AsyncGPUReadbackRequest();

// ...

// Check request done or failed to process to a new detection
if (request.done || request.hasError)
{
 // Request done with success
 if (request.done && !request.hasError)
 {
  // ...
 }
 
 // ...

 // Perform an asynchronous request to get compute buffer on shader execution finished
 request = AsyncGPUReadback.Request(buffer);
}

Determining how much space player takes in AI vision

There is different possible logics to use the result. We could simply directly used the number of seen pixels, and check in which step we are to define how the AI should react. My solution is to get how much space player takes in AI vision as a percentage of the vision. I just divide the number of seen pixels by the number of pixels composing the vision, here the render texture.

1 2	// Convert compute buffer result to a readable format (ratio of nb pixels filled / total nb pixels) m_DetectionRatio = (float)computeBufferResult[0] / (TEXTURE_WIDTH * TEXTURE_HEIGHT);

Conclusion

As I said in the intro, this way is not necessarily the better way to achieve this, but it's a working way I experimented only supposing how to do it. If you get any better solution, or some corrections, don't hesitate to share ;)

This is the Unity project I done to experiment this, ZQSD keys to move your character, and left control / space to crouch, that's usefull to test occlusion. You have some AIs and a minimal level design on the showcase scene, enjoy it.

Download Unity 2018.2 project

About the Author(s)

Kévin Drure

Blogger

See more from Kévin Drure

Related Topics

Related Topics

Recent in More

Related Topics

Related Topics

AI detecting player from vision using camera and shaders

Introduction

Giving vision to our AI

Comparing the render textures

Getting the compute shader result

Determining how much space player takes in AI vision

Conclusion

About the Author(s)

Latest News

Trending

Featured Blogs

Game Developer Essentials