A Real-Time Procedural Universe, Part Two: Rendering Planetary Bodies
There have been a number of articles published in the past few years on adaptive mesh algorithms, which are also called dynamic or continuous LOD (Level-Of-Detail) algorithms. However, all the articles written so far have drawn flat landscapes based mainly on a pre-built 2D height map. Using a little creative thinking, Sean O'Neil has managed to come up with a spherical version that dynamically generates height values as needed.
In Part One of this series I explained how to use a function based on Perlin noise and fractal Brownian motion (fBm) to dynamically generate any point on a planetary body from a set of coordinates. This article will focus on how to render those planetary bodies at real-time speeds using my own spherical version of the ROAMing (Real-time Optimally Adapting Mesh) terrain algorithm. For those of you who read Part One and downloaded the demo I provided, make sure you download the latest source code and Win32 executable files. I've made some significant performance improvements since then, and they are explained in this article.
There have been a number of articles published in the past few years on adaptive mesh algorithms, which are also called dynamic or continuous LOD (Level-Of-Detail) algorithms. Some good examples are "Real-Time Dynamic Level of Detail Terrain Rendering with ROAM" and "Continuous LOD Terrain Meshing Using Adaptive Quadtrees". However, all the articles I've read so far have drawn flat landscapes based mainly on a pre-built 2D height map. Using a little creative thinking, I managed to come up with a spherical version that dynamically generates height values as needed.
As a side note, when I wrote the initial version of this spherical ROAM algorithm (more than a year ago), I couldn't find anything on the Internet regarding spherical DLOD algorithms. Since then, a number of projects have popped up using various techniques. I still haven't seen anyone take the same approach I did, but now you can find more information on this subject if you look for it.
The Traditional ROAM Algorithm
If you haven't read anything about ROAM, I recommend that you read the online paper: "ROAMing Terrain: Real-time Optimally Adapting Meshes". For those who want to skip the gory details, I'll go over the concept more briefly here. The ROAM algorithm attempts to generate an optimal triangle mesh for quickly rendering large-scale terrain maps. It starts with a single square, which consists of two right triangles, covering the entire map. If more detail is needed, a vertex is added to the center of the square, splitting the two right triangles into four. Based on a view-dependent priority calculation, triangles are split recursively in the same fashion until the desired level of detail is reached. ROAM implementations typically implement two main functions that are called every frame, one called Update() to update the mesh and one called Draw() or Render() to draw it.
To determine when more detail is needed in the mesh, each triangle is assigned a split priority. A priority threshold is set, and any triangle that exceeds the threshold is split. A triangle's priority is calculated by determining the amount of visible error it contains. Visible error is essentially the actual amount of error in the triangle divided by the distance from the camera to that triangle. For any triangle that can be culled from the rendering process (i.e. outside the view frustum or facing away from the camera), the visible error should be 0 because the triangle isn't visible.
When splitting triangles in a 3D mesh, care must be taken to ensure that no cracks or seams appear in the mesh. The ROAM algorithm handles this by following one simple rule: except at the edges of the map, only a square can be split. In this case a square is defined as two right triangles of the same size sharing their longest edge. When you need to split a triangle whose neighbor along the longest edge is not the same size, you must split its neighbor first. If the neighbor isn't part of a square, then its neighbor must be split before that. This check continues recursively until a square is reached or the edge of the map is reached, and all triangles along the path are split as the recursion unwinds. The figure below illustrates how this works:
Attempting to split triangle T requires four new vertices to be added to the mesh. Those vertices and the new edges created are highlighted in red and numbered in order. The first vertex makes the bottom-right corner of the map into a square. The second vertex makes a diagonal square in the bottom center. The third vertex makes triangle T part of a square. Last but not least, the fourth vertex splits triangle T.
Because the mesh starts with two triangles and each is split recursively into two smaller triangles, a binary tree is often used to store the ROAM mesh in memory. In this case, each node in the tree contains a triangle at a certain level of detail, and each frame all the leaf nodes in the tree are rendered. Along with containing pointers to its vertices, the triangle object usually contains pointers to its neighbors and members dealing with its split priority.
To keep the mesh optimal as the camera moves around, we need to be able to remove triangles that are no longer needed. The rule for merging triangles back into their parent triangles is that, except at the edges of the map, only a diamond can be merged. A diamond is defined as four right triangles with their 90-degree angles sharing the same vertex, which is equates to a square that has just been split. To merge the diamond, just remove the center vertex and turn it back into a square. Unlike splitting a triangle through recursion, you must wait until a triangle becomes part of a diamond before you can merge it.
In addition, it's not very efficient to figure out if a particular triangle is part of a diamond and then figure out if that diamond needs to be merged. The best way to handle this seems to be to maintain a list of current diamonds at all times. Every split creates 1 diamond and destroys 0-2 diamonds, and every merge does the opposite. The diamond structure should contain pointers to its triangles and members dealing with its merge priority. Every frame merge priorities get checked, and all diamonds that fall below the split threshold get merged.
One of the best things about ROAM is the amount of control you have over the performance vs. visual quality trade-off. You can control it by fine-tuning the priority calculations, changing the priority threshold, setting limits on the frame rate, setting limits on the triangle count, or setting limits on the number of split/merge operations to perform per frame.
Probably the largest problem with ROAM is that mesh changes from one frame to the next might cause a visible "pop" in the terrain. This artifact can be reduced by a process called "geomorphing", where you gradually move vertices from the old position to the new position over a number of frames, smoothing the pop and making the transition less noticeable. You can also slowly change the vertex normal, color, and texture coordinates if necessary.
Mapping ROAM to a Sphere
Because the traditional ROAM algorithm only allows you to split a square, you have to start with a solid 3D shape that consists only of squares. To prevent cracks from forming in the mesh, the edges of these squares must fit together seamlessly in all directions. Each square in this shape will be treated like a top-level square, and each triangle like the root of a separate binary tree. Neighbor pointers have to be set up between these top-level triangles properly so that splitting at the edge of one top-level square can split the corresponding triangle on the adjacent edge.
The obvious 3D shape that consists of all squares with edges wrapping seamlessly in all directions is a cube. Because a subdivided cube doesn't give a well-proportioned sphere, I spent a lot of time trying to come up with a more appropriate shape, but I couldn't find anything that fit the requirements. Later on, I realized that I was wasting my time because the split priorities would quickly take care of the sphere's proportions. If there is too much error in one triangle because its longest edge is out of proportion, it will be split.
Once you set up the 12 root triangles of the cube with the proper neighbor pointers, everything falls into place and the traditional ROAM algorithm works beautifully. In fact, it's even easier to implement than the flat ROAM algorithm because you don't have to test for conditions at the edges of the map. The only thing you need to treat differently is how you implement your height values. Instead of simply adjusting a vertex's y coordinate, you need to treat your height value as an offset from the sphere's radius. Mathematically, treat each new vertex as a direction vector from the center of the sphere. Normalize the vector and multiply it by the sphere's radius plus the height offset.
One other simplification I made to the ROAM algorithm was to use a linked list of triangles instead of a binary tree. The tree structure takes up more memory, slows the routine down, and is more complicated to implement. A linked list is sufficient as long as each triangle keeps a pointer to its edge neighbors and the triangle it was split from (its parent). The list is initialized with the 12 top-level triangles, which have their edge pointers set up properly and their parent pointers set to NULL. When you split a square, add two new triangles to the list with their parent pointers set to the two existing triangles, then add a vertex and rearrange the vertices and edge pointers of the four triangles. Do the reverse when merging a diamond.
To simplify the split and merge operations, it is a good idea to maintain the same order of the vertices and neighbor pointers in the triangle object. I chose to order my vertices (0, 1, 2) counter-clockwise with vertex 1 always being opposite the longest edge. My edge neighbor pointers are ordered based on the edge shared, 0-1, 1-2, and 2-0 respectively. Since vertex 1 is opposite the longest edge, this means that the last edge pointer (i.e. array index 2) always shares the longest edge.
Note from Figure 3 that it's not that hard to determine whether a triangle is part of a square or diamond using edge pointers. If triangle1->edge[2]->edge[2] = triangle1, then triangle1 is part of a square. Otherwise, you need to split triangle1->edge[2] to make triangle1 part of a square. If triangle1->edge[0]->edge[0]->edge[0]->edge[0] = triangle1 (you can also replace those 0's with 1's), then the triangle is part of a diamond.
Optimizing the Update() function
Now that you have the basic idea behind my spherical ROAM algorithm, you need to know how to optimize it for a dynamic planet generation algorithm like the one I explained in Part One. The first key to making it fast enough for real-time rendering is to realize that the fractal function you're calling is very slow. Even if you change it to use some other type of algorithm, it is relatively safe to assume that it won't be a cheap function to call. Your only hope to get it to run fast enough is to minimize the number of calls to that function.
Keep in mind that you have to update the priority of every triangle in the mesh each frame to determine whether it needs to be split or not. That priority depends on the height of the new vertex, which is determined by calling the fractal function. So to keep from having to call the fractal function for every triangle every frame, you need cache the new height value in your triangle object. So as each triangle is created from a split, you should call the fractal function to get the height value for the next potential split.
Once I had that implemented, I realized I was still calling the fractal routine twice as often as necessary. Because each triangle shares its longest edge with another triangle in a square, the second triangle to be created for that square was making the same call to the fractal routine. It's easy enough to determine whether a triangle is completing a square at the time of its creation, and if so borrow the calculated height value from its neighbor. Also note that there is also no need to call the fractal routine when you merge a diamond because the vertex you are removing has the information you need. Just take the vertex's height value and use it to compute the offset to put into the merged triangle objects.
Next I focused on the priority calculations. The first thing to keep in mind is that if the merge priority is not calculated in the same way the split priority is calculated for a triangle's parent, you quickly get into a situation where each frame you will split several triangles only to have them merged together again right away. You can also get into a situation where unneeded triangles are kept around long after they're needed. Because I'm not using a binary tree, I'm not keeping track of the original state of the parent triangles. This was easy enough to solve by adding a few members to my diamond object to keep track of the necessary information.
There are a number of other changes I can think of off the top of my head that could theoretically speed it up even more. The simplest way is to come up with some logic to avoid doing a priority check on every triangle every frame. At 50,000 triangles, this can be quite a burden on the CPU. I recently read an interesting article that described an improved split priority concept. In this article, the priority calculations were pre-calculated to determine the distance at which a triangle would be split or a diamond merged, and updates were driven mainly off of distance checks. The author then grouped polygons together into a tree using bounding spheres to greatly reduce the number of distance checks that needed to be made each frame. I'm not clear on the details surrounding their implementation, but they reported a dramatic improvement in their Update() function.
Optimizing the Draw() function
Next I concentrated on the Draw() function, and switched from direct calls to glVertex, glNormal, and glTexCoord to a vertex array. I created a dynamically sizing array class, and made it global so that all ROAM objects use the same vertex array. This optimization gave a noticeable boost in frame rate, which illustrated to me that my optimizations to the Update() function had paid off and the bottleneck had moved over to the rendering pipeline. If the rest of the code had been poorly optimized, I would've seen a very small improvement optimizing the rendering pipeline.
Then I looked at the Nvidia documentation to find the fastest way to push non-static triangles to the video card. I found that because I had already implemented a vertex array, it took very little effort to implement the GL_NV_vertex_array_range and GL_NV_fence extensions. These extensions allow you to copy vertices directly to AGP or video memory, and when you render with the standard glDrawElements function, it renders them extremely quickly. I believe the effect is the same as using DirectX's vertex buffers stored in video memory.
Given all these optimizations, I get about 20 FPS rendering 50,000 triangles using a 750 MHz Duron with a GeForce2 MX video card. When I toggle the mesh updates (i.e. skipping the Update() call) on the same machine, it speeds up to 45 FPS with 50,000 triangles, so there's still room for improvement in the update routine. The only further optimizations I can think of making to the Draw() function would be to see if I could convert my triangle lists to a set of strips or fans, but that would require even more changes to the Update() function to maintain the list of strips and fans. Given the frame rates I'm currently getting, I can't imagine it would improve performance much unless I came up with a simple way to maintain them.
Looking at the Code
I want to go over the classes I wrote for this ROAM algorithm so you can get a good idea of how they work. To keep it concise, I will only list and explain key classes, members, and methods here. This is just to explain some of the finer points of what I did to make it easier for you to go through the demo source and modify it or come up with your own implementation based on it.
CROAMVertex:
class CVertex
{
public:
static CVertexArray Array;
CVector m_vPosition; // The vertex position
CVector m_vNormal; // The vertex normal
float m_fTexCoord[4]; // 2 sets of texture coordinates
};
The CVertex class doesn't have any ROAM-specific members or methods in it, it's just a regular vertex with space for 2 sets of texture coordinates for multi-texturing. Note the static CVertexArray object, which is currently used as a global OpenGL vertex array shared by all ROAM instances. This needs to be global because I'm only allocating room for 65536 vertices in video memory (NVidia-specific), and I need to make sure one ROAM object doesn't step on another. If necessary I could make a second global array, perhaps to share non-ROAM vertices, and it would allocate its own block of video memory and work in a similar fashion.
CVertexArray
#define VERTEX_ARRAY_SIZE 65535
class CVertexArray
{
protected:
// Members for the vertex array
unsigned short m_nStackPosition; // Index into the stack
unsigned short *m_pStack; // A stack of free elements
unsigned char *m_pUpdated; // An array of updated flags
CVertex *m_pBuffer; // Contains the actual vertices
// Members for GL_NV_vertex_array_range and GL_NV_fence extensions
bool m_bFence; // true if extensions are enabled
CVertex *m_pArray; // pointer to allocated video memory
GLuint m_nFence; // utilize fence synchronization
// Members for GL_ARB_multitexture extension
int m_nMaxTextureUnits; // Is multi-texturing allowed?
public:
// Init and Cleanup methods
void Init(); // Initializes extensions, allocates video memory
void Cleanup(); // Frees video memory
unsigned short GetElement(); // Call to get a new array index
void UpdateElement(unsigned short); // Call after updating a vertex
void ReleaseElement(unsigned short); // Call to release a vertex
// OpenGL vertex array methods
void InitRender(); // Call before rendering each frame
void FinishRender(); // Call after rendering each frame
int GetMaxTextureUnits();
void EnableVertexArray();
void DisableVertexArray();
void EnableNormalArray();
void DisableNormalArray();
void EnableTextureCoordArray(int, int);
void DisableTextureCoordArray(int);
};
The CVertexArray class implements the OpenGL vertex array I use for my ROAM algorithm. To properly encapsulate an OpenGL vertex array, it must handle the GL_NV_vertex_array_range and GL_NV_fence extensions, and it must contain wrappers to call functions like glVertexPointer(), glNormalPointer(), and glTexCoordPointer(). As you can see here, it keeps a stack of free array indexes into the vertex array. When the ROAM split function needs to add a new vertex to its mesh, it calls GetElement() to get a new index into the array. When the ROAM merge function needs to remove a vertex from its mesh, it calls ReleaseElement() to put that index back on the free stack.
CROAMTriangle
class CROAMTriangle : public CListNode
{
public:
// Triangle members
unsigned short m_nVertex[3]; // Indices into CVertex::Array
CROAMTriangle *m_pEdge[3]; // Pointers to edge neighbors
CROAMTriangle *m_pParent; // Pointer to the parent triangle
CROAMDiamond *m_pDiamond; // Set if this triangle is in a diamond
// Split priority members
CVector m_vMidpoint; // The midpoint of the longest edge
float m_fOffset; // The offset in height
float m_fLength; // The length of the longest edge
unsigned char m_nFlags;
float GetPriority(CVector &vPosition, CVector &vHeading, float fHorizon);
void Draw(GLenum nMode, unsigned short *nArray, unsigned short &nIndex);
};
This class requires a little more explanation. As I mentioned above, my ROAM implementation uses a linked list of triangles instead of a binary tree. The list is initialized with the 12 top-level triangles, each with its parent pointer set to NULL. When a square is split I create two new triangles and set their parent pointers to the two existing triangles in the square, then I rearrange the vertices and edges of all four triangles. This means that the true parent triangle doesn't really exist in the list anymore. But as long as I have a pointer linking a triangle with the triangle it was split from, rebuilding the parent is easy when it needs to be merged.
Every time I create a new triangle during a split operation, I cache the midpoint and length of its longest edge (the edge to be split). I also cache the offset from that midpoint to the vertex's correct location (i.e. the error). If the new triangle is completing a square, then the values are copied from its neighbor. Otherwise, the values are calculated. All three values are used in the priority calculation, which also takes into account the camera position and heading and the distance to the horizon.
Initially the GetPriority() method returned 0 if the distance to the vertex was greater than the horizon or it was out of the current angle of view, but this caused problems. Essentially, if you turned the camera away from a planet while close to it, all polygons would be merged and the sphere would be turned into a cube. When you turned the camera back to face the planet, the cube would not always be split because the vertices checked were so far away from the camera that they were either beyond the horizon or outside the angle of view. So now a priority is always calculated if the camera is closer to the vertex than the length of the edge to be split.
The Draw() method simply fills in an array of indices for a future call to glDrawElements(). I've toyed with adding a check to skip drawing triangles that aren't in the current view, but I've got that commented out right now so I can turn off updates from a specific viewpoint, then fly around the planet in wire-frame mode to see how well the mesh is optimized. You may get a small improvement in performance by un-commenting the lines dealing with the TriangleDrawMask flag.
CROAMDiamond
class CROAMDiamond : public CListNode
{
public:
// Triangles that make up the diamond
CROAMTriangle *m_pParent[2];
CROAMTriangle *m_pChild[2];
// Merge priority members
CVector m_vMidpoint; // The midpoint of the longest edge
float m_fOffset; // The offset in height
float m_fLength; // The length of the longest edge
unsigned char m_nFlags;
float GetPriority(CVector &vPosition, CVector &vHeading, float fHorizon);
};
This class is a good bit simpler than CROAMTriangle, but it is just as important. A diamond is made up of four triangles. Because two of them are always parents of the other two, the parent and child pointers are kept separately to make the merge routine simpler. Note how the priority members stored in CROAMTriangle are duplicated in this class. These values are what would've been cached in the original parent triangle that gets destroyed during a split. Caching them in CROAMDiamond allows us to calculate a merge priority that matches the parent's split priority exactly.
Note that the diamonds are in a linked list, each contains a pointer to its 4 triangles, and a triangle contains a pointer to the diamond its currently in. This is pretty redundant, but it is necessary to get good performance. When splitting a triangle, you need to be able to find and destroy the diamond it's in (if any) very quickly. I used to scan the list of diamonds, thinking that the list would stay rather small, but this killed my performance when a lot of splits needed to be done each frame.
CROAMSphere
class CROAMSphere : public C3DObject
{
protected:
CROAMTriangleList m_llTriangle; // List of triangles in the mesh
CROAMDiamondList m_llDiamond; // List of diamonds in the mesh
CROAMAlgorithm *m_pAlgorithm; // A planet algorithm
int m_nTriangles; // The current triangle count
void Init(CROAMAlgorithm *pAlgorithm);
bool CollisionCheck(C3DBase *pCamera);
void Split(CROAMTriangle *pTriangle);
void Merge(CROAMDiamond *pDiamond);
bool Update(C3DBase *pCamera, float fMaxError);
void Draw(C3DBase *pCamera, bool bTexture=true);
};
We finally get to the main ROAM class. The key member variables should be about what you expected at this point: a triangle list, a diamond list, and an algorithm object for retrieving the height and texture coordinates of new vertices. The Init() method takes an initialized algorithm object, and builds the 12 top-level triangles of the cube. The CollisionCheck() method calls the fractal function to determine whether the camera's position is inside the planet or not. This is a faster and more accurate way to do collision detection than to check against the current mesh.
The Split() method looks messy right now and could probably be cleaned up a bit, but it is fairly easy to explain with some pseudo-code:
void CROAMSphere::Split(CROAMTriangle *pTriangle)
{
if(pTriangle->neighbor does not make a square with pTriangle)
Split(pTriangle->neighbor)
pOpposite = pTriangle->neighborpVertex = CVertex::Array.GetNewElement()
Initialize pVertex's position, normal, and texture coordinatesCreate a new triangle and add it to the triangle list
Split pTriangle into the new triangle
Create a new triangle and add it to the triangle list
Split pOpposite into the new triangleUpdate the pVertex's normal based on the new triangles
if(pTriangle was part of a diamond)
delete pTriangle->m_pDiamond;
if(pOpposite was part of a diamond)
delete pOpposite->m_pDiamond;
Add the new diamond created by the new split
}
The Merge() method is almost as messy right now, but it is also fairly easy to explain with some pseudo-code:
void CROAMSphere::Merge(CROAMDiamond *pDiamond)
{
Merge parent1 and child1 into parent1
Merge parent2 and child2 into parent2
Delete pDiamond
if(parent1 forms a new diamond)
Add the new diamond to the diamond list
if(parent2 forms a new diamond)
Add the new diamond to the diamond list
Release the vertex we just merged back to CROAMVertex::Array
}
The Update() method takes a camera position and an error threshold from the game engine. It loops through all diamonds and triangles, checking their priority and merging or splitting when they cross the error threshold. Recently, I added a wait state for triangles and diamonds so that if they're not close to the threshold, they won't be checked every frame. This provided a decent performance improvement, but I'm sure there's a better way to do this.
Originally, the Draw() method did nothing but render the polygons in the mesh. Then I switched to an OpenGL vertex array, building an array of indices to call glDrawElements(). The first time I tried this, I ran into problems trying to draw more than 20,000 triangles. I had read that you couldn't have more than 64K vertices in your array, but I wasn't anywhere close. I figured out that most video cards also won't allow you to pass any more than 64K indices to glDrawElements(). So I changed my Draw() function to build an array of indices separately, and wrote a loop to send them to glDrawElements() in blocks of 60,000.
Final Notes
Now that you've seen how to generate and render a full-size planet at real-time speed with dynamic level of detail, I encourage you to play around with the source code and see what you can come up with. It should run pretty well on most systems with OpenGL-accelerated drivers, and it should fly on NVidia GeForce cards. Because it only generates and uses memory for vertices the camera is close to, it should be fairly easy to expand this demo to generate and render an entire solar system, or an entire galaxy of star systems, without a significant impact on memory or the CPU. The next article in the series will focus on doing that and will explain how to fix some of the problems you will run into working with such a large game world.
______________________________________________________
Read more about:
FeaturesAbout the Author
You May Also Like