[Gamasutra is proud to present an excerpt from the book Real-Time Cameras by Mark Haigh-Hutchinson, a veteran Retro Studios staffer who was diagnosed with pancreatic cancer in 2008 and passed away in 2009. The book, published posthumously with help from his colleagues and friends, is available now. For more information on the book's creation, please read this blog post.]
The Camera as an A.I. Game Object
In many ways, the problems presented by camera navigation have parallels within the domain of artificial intelligence (AI). As suggested earlier [in the book], the camera may be thought of as an AI character, at the very least in terms of the determination of its position and orientation within the game world.
The camera may be able to move more freely than many types of AI characters, however, since it is typically floating rather than being constrained to moving on surfaces or under the influence of a physical simulation. This freedom of movement is one of the benefits of a virtual camera system when compared to real-world cameras.
Yet, even though the camera may determine its path finding in a similar way to that used by AI characters, there are usually additional constraints concerning (for the most part) aesthetic issues such as target object framing, target object distance, and render geometry avoidance.
There are many interesting areas of AI navigation research, such as robotics, that may be applied to cameras. In particular, research toward autonomous methods of movement determination is highly relevant if we desire to have camera motion that is responsive to dynamic game events. The references at the end of this excerpt contain several pertinent examples such as Borenstein.
Since there are a large variety of environmental types, we will assume here that the camera is constrained to remain within a closed environment consisting of a collision surface representation such as a polygonal mesh, with additional independent game objects. Here closed means that the representation of the environment forms a contiguous surface that constrains all usual game objects (even so, there may still be cases where it is necessary for the camera to be positioned outside of the environment).
The collision surface may be considered as a separate entity to that of the rendered world (although they may in fact be the same), with additional information regarding surface materials stored on a per-face or sub-surface basis. It is also assumed that mechanisms exist within the game engine to cast rays through the environment returning results as to collision points and the materials found at that point.
Additionally there may be a need to filter the types of materials that are eligible for detection in any particular ray cast (e.g., some surfaces may allow the camera to pass but not regular game objects). Optimization and organization of collision data is not discussed in this book, but the reader is referred to VandenBergen and Ericson for more information about collision system design and implementation. Additionally, Chapter 9 describes some of the main methods used in camera collision detection and avoidance.
Predictive cameras (as described in Chapter 2) typically have better navigation results than reactive cameras, as they are more likely to anticipate changes to the environment that would occlude the player character. Naturally, this comes at additional performance cost.
Fundamental to all navigation methods are the ways in which the camera may request environmental data regarding potential obstructions or paths of motion. The nature of the game environment, its data representation, and facilities offered by the game engine will greatly influence the choice of navigation techniques. We may consider there to be two general classifications: dynamic and pre-defined.
Dynamic navigation techniques
Dynamic navigation refers to techniques that do not rely on predefined information to position or move the camera. Instead, they question the game world to determine how the camera should be positioned on a per-frame basis. Since this can be computationally expensive, a number of techniques may be required to amortize the processor cost.
An important consideration when using dynamic solutions is that it is difficult to account for the myriad of positions and actions that may be performed by the player. This means that it is difficult to test all the permutations as well as to be certain that no special case exists that might be problematic for the chosen navigation solution. Let us examine some of the more common dynamic navigation methods.
One of the simplest and most common navigation methods is that of the ray cast, a mathematical projection of a straight line through the virtual environment. By using trigonometry it may be determined if the line would intersect with either the collision geometry corresponding to the environment or that of an object within it. As might be expected, this can be computationally intensive especially given the complexity of many environments.
An important goal when implementing camera systems is to limit the amount of such testing whenever possible. Amortization of ray casting is entirely possible and can be quite effective in reducing the processor requirements at a cost of limiting the amount of information available to use for navigation.
Fortunately, navigation does not require a per-frame updating of such information as its decisions are normally longer lasting and may take more time to evaluate than collision or other immediate concerns. Similarly, reducing the set of data to be compared against (e.g., using spatial partitioning or material filtering) is also recommended.
Ray casts can often provide extremely pertinent information for camera systems, for example, collision prediction and player character occlusion. Furthermore, ray casts may be combined together to produce a more detailed view of possible obstructions. On the other hand, since ray casting normally involves (by definition) projection of a 3D line through the game world, it is entirely possible for the ray to pass through small openings and thus not register a collision.
Normally this problem is reduced by simply increasing the number and positioning of the ray casts to cover a larger area, although care must be taken given the potential performance costs. It may also be avoided by having a separate collision mesh that only pertains to camera movement and would thus not include such small openings.
An alternative solution is to use ray-casting hysteresis, that is, statistical information of the ray cast results gathered over multiple updates. From this history, it is possible to generate a probability graph of likely ray collisions and to reduce the actual amount of ray casts per update. The probability graph is then used to determine the influence applied to camera motion. For example, according to the position of the ray cast relative to the camera, we can deduce that lateral or vertical motion of the camera is probably necessary to avoid a future collision if the ray is prevented from reaching its target.
The results of the ray casts can then be used to dictate potential camera motion in a variety of ways. First, a weight or influence factor for each ray cast can be applied depending on the success of the ray cast. Thus, rays not reaching the target cause the camera to move in a direction away from their position.
An alternate method often used is to cast rays from the target object back toward the camera. The results of these ray casts are utilized in a similar manner. If a ray is cut short then once again it influences the movement of the camera. The degree of influence is often based on the proximity of the collision point to the origin of the ray cast, or sometimes simply a pre-defined weighting factor depending on the particular ray cast direction.
One of the problems with ray casting is that the determination of whether the ray intersects with the environment (or indeed, other game objects) is often computationally expensive. Additionally, it is often necessary to apply property information to the collision surfaces.
This information may be used to determine if a collision or ray intersection is actually valid. It is common that the game camera may ignore a significant portion of the environment if these filters are set appropriately. This is typically performed by a simple masking operation of desirable properties against the set of properties applied to the surface.
A typical implementation of simple ray casts would use a fixed arrangement of source positions around the proposed position of the camera, with the rays extending toward the target object. Care must be taken to ensure the arrangement does not intersect with world geometry.
Often the ray positions are arranged in a grid or other symmetric shape. Each ray is given an influence factor that may be applied horizontally, vertically, or both. If the ray cast fails to reach the future position of the target object, the influence is applied to the desired camera position. The influence may even be calculated according to its position relative to the target.
For all ray casts
If ray cast successful
No influence applied
If ray cast fails
Scale influence factor by distance of ray cast collision from target
Add influence to desired camera position
Once the total influence has been calculated, the camera will move toward the desired position plus this influence. Note that the influence may be calculated differently in each axis, if so desired.
Rather than simple rays, this technique projects a volume through the world from the camera position toward its target position. This volume is typically an extruded sphere (sometimes referred to as a capsule); for simplicity and performance, it may be either rectangular or cylindrical. Performance improvements can be gained by simulating the volume using several ray casts around the perimeter of the volume.
As with other collision determination techniques it is important to filter out any unimportant objects or geometry from the potential collision set. Bounds checking of axis-aligned bounding boxes or similar techniques may be applied to initially cull objects.
Volume projection is often used to determine if the camera would fit within confined spaces, or when determining a new potential position relative to an object (e.g., after a jump cut). It may also be used to ensure that camera motion is valid by projecting a motion volume from the current position to the desired position.
The camera is often considered a sphere for collision purposes. This is partly because sphere collisions are relatively straightforward, but perhaps also because spheres are likely to slide along collision surfaces. The radius of the collision sphere should be slightly larger than the distance of the view frustum's near plane from the camera, thus preventing interpenetration of the camera with render geometry.
This is only true, however, if the collision geometry of objects or the environment extend beyond that of the render geometry. Note that this distance is also dependent upon the horizontal field of view. Ideally, the collision volume encompasses all four vertices of the frustum at the near plane.
Alternatively, the camera collision volume may be considered as a simple vertical axis-aligned cylinder. In environments where the camera will not move close to horizontal surfaces such as floors or ceilings, this approximation should provide equal functionality to a sphere at a reduced performance cost. The same caveats apply regarding the near-plane distance, and vertical collision must take into account the elevation of the camera, which might change how geometry would interpenetrate the near plane.
In the case of first person cameras, no collision volume or testing may actually be necessary since the camera should be completely contained within the collision bounds of the player character (it would also be somewhat disconcerting if the camera was prevented from moving and the player was not).
However, it is still necessary to ensure the collision bounds of the player are larger than the near-plane distance to avoid the same manner of rendering artifacts as those prevalent in third person cameras. Interpenetration of the near plane is still possible if there are game elements (such as game AI objects) that do not test for collision against the player, or that have render geometry protruding from their collision volume.
Dynamic path finding
Real-time or dynamic path finding has obvious similarities to solutions used for AI object navigation of the game world. However, path finding for cameras is more stringent than AI path finding -- aesthetic constraints play a large part. The underlying idea is that precomputed information is available that defines the volumes of space available for motion through the world. The data structure used to represent these volumes can vary but usually includes methods to traverse the volumes through connectivity information (which may also be directional in nature).
Traditional search algorithms such as A* or Dijkstra's Algorithm (as described in many AI or computer science text books) may be used to determine the most efficient path through the volumes, given weighting values that dictate the desirability of a particular path. The desirability determination for camera motion is likely to be quite different from that of an AI object, however.
Once a path has been generated, the camera will follow the path until it reaches the desired position or the movement is interrupted by a new request. Usually this occurs when the path is recomputed on a regular basis, to account for player motion or environmental changes and so forth. Often the path searching occurs at two levels. The high-level solution determines the overall motion of the camera between volumes, combined with a low-level solution to avoid game objects, proximity to collision geometry, and so forth.
The difference between this solution and ones used for AI is that the latter tend to deal with longer paths of motion along surfaces (though sometimes flying or jumping), whereas cameras are normally relatively close to their target positions. In the case of cameras, the path information is used more for collision avoidance rather than a complete motion path. Longer motion paths such as those used during non-interactive cinematic sequences are most often pre-defined and do not require any type of navigation logic to be applied.
Dynamic path generation
Path generation differs from the path finding previously mentioned because the shape and length of the new path is not determined according to pre-determined information built into the game world. Rather, information about the game world is dynamically determined according to the current situation and used to generate the most appropriate path when required. Moreover, it is likely that this path will change over time, although this may cause unwanted camera movements. Dynamic paths are typically used in situations where standard navigation is unable to determine an appropriate movement.
Another technique for dynamic path generation is to use information garnered from the motion of the player character. In many situations, a history of player movement data may be used to derive a potential movement path for the camera. If it is known that the player has passed through a confined space (e.g., a doorway), the camera may clearly follow the same path.
However, such a path would likely present a very different view of the player than normally seen. Therefore, it may be necessary to generate a volume around the player movement points to find the potential limits of valid camera motion (e.g., as an extruded cylindrical volume). This volume would be used to present a view in keeping with the expected displacement of the camera from the player character.
Visibility and rendering solutions
With the advent of hardware-assisted rendering, a new variety of solutions became available for determination of the desired camera position. We start with the assumption that the final position for the camera will be somewhere relative to the target object, and that the desired position should have an unobstructed view of the target object.
Next, we add constraints for the distance and elevation of the desired position relative to the target object. This produces a cylindrical band of potential positions around the target object. Further constraints can be layered on top of these requirements, such as only allowing a certain angular displacement relative to the target object.
Given all these constraints, we now have a small subset of potential positions to be tested for viability. If we were to render the world from each of these potential positions such that the camera was oriented toward the target object, it would result in scenes that may or may not show the target object according to occluding geometry or game objects.
If, instead, we render these views so that the target object is drawn in an easily distinguishable (and unique) color, the rendered screen buffer may be analyzed to determine if the target object was occluded. If the object is sufficiently non-occluded, then we can consider the rendered camera position as a viable position. Other hardware-accelerated techniques may be used to determine the occlusion of the target object from each of the potential positions, including depth buffer analysis.
By exploiting the parallel nature of the graphics hardware, the cost of this determination may be significantly amortized. There are, however, some caveats to this approach. First, the setup for rendering the scene from each potential position may be prohibitive.
Second, although rendering via hardware is fast, it is certainly not free. The GPU and CPU cost may be reduced by techniques such as rendering the scene flat-shaded (i.e., no lighting), changing the view frustum to restrict the scene to only the area around the target object (including far clip plane distance), and reducing the resolution of the rendered scene.
Once we have determined which of our original potential camera positions is the most preferred, we can then calculate a path or other movement method to take the camera from its current position to the desired position. Although this technique works well in determining the position, it does not take into account geometry intervening between the current position and the desired position. Additional constraints may be applied to avoid these problems. A practical implementation of these techniques may be found in Halper.
Colliders are a variant upon the standard ray casting techniques as previously mentioned. The principle here is that the positions from which the ray casts are made are variable. Each of these positions is enclosed within a small collision volume (typically a sphere or approximated to this by another ray cast).
These position objects are referred to as colliders, since they are constrained to stay within the environment. Usually the colliders will be arranged in a particular pattern around the camera, such as a circle or semi-circle. The colliders are constantly attempting to regain their desired positions relative to the camera's ideal position as the camera moves through the environment.
The arrangement of the colliders influences how the camera will react to the environment. Each collider casts a ray toward the look-at position of the camera, or alternatively some other pre-defined point relative to the player character. The result of the ray cast is used to offset the desired position of the camera motion. The amount of influence exerted by a collider may be based on a number of factors:
- The physical distance of the collider from the camera possibly calculated in local camera space and applied in the same manner.
- A pre-defined influence factor per collider determined according to the position of a particular collider within the entire group.
- The distance from each collider to the point at which its ray cast struck geometry.
- The acceleration or velocity of the player character, possibly only in one axis depending on desired behavior (e.g., change elevation according to player speed).
Some combination of the above factors may be applied to some or all colliders. Often there are several groups of colliders, each influenced by different factors. The ways in which the influence is applied also varies, possibly using a weighted average or a centroid calculation. Some experimentation will be required according to the types of environment encountered in a particular game.
Differing arrangements of these colliders may be used according to the type of behavior required for the environment. Circular or semicircular arrangements are common, as are horizontal ones. The arrangement may vary according to changes in the player character, most commonly velocity.
Colliders and other ray casting techniques are susceptible to "noise" produced by ray casts that collide with incidental environmental features that should not adversely affect camera motion. Some knowledge of the environment may be used to filter out results that should not influence camera motion. Tagging of surface properties may also be used to ignore or apply a reduced influence on the camera position determination.
Alternatively, surfaces may be tagged to act as motion constraints or to otherwise prevent the camera from approaching. The arrangement of the colliders and the amount of influence each applies may require changing according to the environment properties (e.g., confined spaces, vertical columns, etc.) or game play mode (e.g., to match changing movement characteristics of the player character).
Since the arrangement of colliders may vary considerably, here is only an outline of the algorithm used to determine their influence upon the desired camera position:
// might be useful as a class
for (int i = 0; i < colliders.size(); ++i)
// the weighting depends on line of sight and/or
// other factors such as the relative collider
// will need to get an average or similar
The arrangement of colliders will change the calculation method used to determine the average amount of influence to apply. A centroid calculation works well if the colliders are constrained to a plane, but however the final influence is calculated, it must also be applied in the same space as the collider's offsets.
Pre-defined navigation techniques
Many of the problems associated with camera collision can be avoided by pre-computation of potential camera movement or by using pre-defined assistance based on knowledge of the environment (whether defined explicitly by the camera designer or otherwise automatically pre-calculated).
Camera navigation pre-computation may take many forms, usually at a cost of both memory overhead and time to compute the allowable camera positions, although the latter is not generally as important. Pre-computation typically has an advantage in lower processor overhead during run-time.
Assistance for navigation is often in the form of game script objects placed by designers and therefore tailored to the specific requirements of both game play and environmental concerns. Chapter 6 discusses scripting and its application to camera motion and positioning; however, it is possible to derive navigation assistance information from other game objects than those specifically designed to control camera behavior.
There are a variety of ways in which both types of information may be used to assist camera navigation and collision avoidance. Many of these schemes share common ground with AI or robotic research techniques, and the bibliography outlines some of the relevant papers.
Some of the more common solutions are as follows.
The most common solution for pre-computation is to literally store entire motion paths. Several games have used this effectively since it also allows complete control over camera position (and possibly orientation), irrespective or directly according to player action. Nevertheless, path motion can be somewhat restrictive and could potentially result in the player experiencing a sensation of lack of control.
Conversely, the ability to directly specify the position of the camera is a boon when dealing with motion through a complex environment, especially if the determination of the position can be varied according to player position or action. A full description of path-based camera motion may be found starting with the section Path in Chapter 7.
Path motion behaviors
Although the camera may be constrained by using a path to completely define its motion, the interpretation of how the spline actually constrains the motion offers several alternatives.
- Constrained path camera. A constrained path camera is one that moves precisely along a defined path within the game world. The camera is typically not allowed to leave this path under any circumstance, unless the path is no longer active. This is one of the most typical uses of path cameras, so the camera position may be absolutely controlled to avoid collision or interpenetration with environmental features, and so forth. Non-interactive movies are often driven by this type of camera (as well as stationary or slaved position cameras).
- Follow path camera.
In many cases, the motion of a camera on a constrained path may not appear
as "organic" or "natural" as required by the game
designers, even though the motion along the path may be smooth. However,
we may still wish to restrict camera motion in some fashion. A follow
path camera provides a mechanism to both restrict camera motion yet
allow some variance. In this case, the path only dictates a desired
position -- how the camera reaches that desired position can be defined in
a number of different ways -- i.e., the camera is not locked directly to
This may be likened to a "carrot" being dangled in front of the camera. The camera will seek to the location defined by the current evaluation of the path, but is allowed to move in a direct line toward that position (note that this same technique may be applied regardless of the position determination method).
Alternatively, we can use rules to control the distance away from the desired path that the camera is allowed to move; for example, the path now becomes a tube (typically cylindrical) although the cross section may not be circular (e.g., ellipsoidal or rectangular). Indeed, information about the shape of the path volume may be contained within the knots themselves to allow for variance of camera motion along the path (including the radius of the tube).
Another approach is to use a parametric curve with the input parameter as the position of the camera along the path. These methods allow local control over the camera without having multiple paths. When restricting the motion of the camera away from the follow path, it is normally best not to use "hard" constraints as this will likely lead to unwanted motion artifacts. A damped spring or attractor/repulsor may be used to provide a more analog response to the distance restriction.
Camera motion may also be restricted such that the camera is confined by the extents of a pre-defined volume. The actual definition of the volume is somewhat variable. It may consist of a simple convex shape such as a sphere, cylinder, ellipsoid, or cube. Alternatively it may be derived from a complex collision mesh, much as would other game entities.
Complex shapes, of course, make navigation a more difficult proposition. The desired position within the volume may be defined in a number of different ways. Clearly, we can use the same techniques as mentioned earlier in this chapter regarding player character-relative positions with the additional constraint of remaining within the volume.
This approach is applicable to most of the dynamic determination methods, so that the pre-defined volume constraint is applied after the initial determination has been made. It is typically easier to use fixed dimension volumes, but these could be varied according to game play requirements; for example, the position of the player relative to a reference frame.
In the same manner as path behaviors, directly constraining the camera position may lead to non-continuous motion and often benefits from the use of forces or damped springs to keep the camera within the volume.
Effective navigation assistance may be obtained by using attractors or repulsors, where the camera is literally attracted to (or repulsed from) pre-defined positions, regions, or other game objects. Often these positions are defined explicitly using script objects that are placed by the camera designer. Alternatively, offline analysis can be made of the environment to determine automatic placement; in practice this can be a difficult prospect without explicit rules governing their usage.
Both attractors and repulsors function by applying forces to the camera depending on the proximity (and possibly orientation) of the player or camera.
Note that these forces may not be implemented as part of the physical simulation (even if such a facility is available); rather they may simply influence the determination of the desired position or act as hard constraints during the actual camera motion.
The use of proximity as part of the c