In Part
One of this series I explained how to use a function based on Perlin
noise and fractal Brownian motion (fBm) to dynamically generate any point
on a planetary body from a set of coordinates. This article will focus
on how to render those planetary bodies at realtime speeds using my own
spherical version of the ROAMing (Realtime Optimally Adapting Mesh) terrain
algorithm. For those of you who read Part
One and downloaded the demo I provided, make sure you download the
latest source code and Win32
executable files. I've made some significant performance improvements
since then, and they are explained in this article.
There have been a number of articles published in the past few years on
adaptive mesh algorithms, which are also called dynamic or continuous
LOD (LevelOfDetail) algorithms. Some good examples are "RealTime
Dynamic Level of Detail Terrain Rendering with ROAM" and "Continuous
LOD Terrain Meshing Using Adaptive Quadtrees". However, all the articles
I've read so far have drawn flat landscapes based mainly on a prebuilt
2D height map. Using a little creative thinking, I managed to come up
with a spherical version that dynamically generates height values as needed.
As a side note, when I wrote the initial version of this spherical ROAM
algorithm (more than a year ago), I couldn't find anything on the Internet
regarding spherical DLOD algorithms. Since then, a number of projects
have popped up using various techniques. I still haven't seen anyone take
the same approach I did, but now you can find more information on this
subject if you look for it.
The Traditional
ROAM Algorithm
If you haven't read anything about ROAM, I recommend that you read the
online paper: "ROAMing
Terrain: Realtime Optimally Adapting Meshes". For those who
want to skip the gory details, I'll go over the concept more briefly here.
The ROAM algorithm attempts to generate an optimal triangle mesh for quickly
rendering largescale terrain maps. It starts with a single square, which
consists of two right triangles, covering the entire map. If more detail
is needed, a vertex is added to the center of the square, splitting the
two right triangles into four. Based on a viewdependent priority calculation,
triangles are split recursively in the same fashion until the desired
level of detail is reached. ROAM implementations typically implement two
main functions that are called every frame, one called Update()
to update the mesh and one called Draw()
or Render() to draw
it.
To determine when more detail is needed in the mesh, each triangle is
assigned a split priority. A priority threshold is set, and any triangle
that exceeds the threshold is split. A triangle's priority is calculated
by determining the amount of visible error it contains. Visible error
is essentially the actual amount of error in the triangle divided by the
distance from the camera to that triangle. For any triangle that can be
culled from the rendering process (i.e. outside the view frustum or facing
away from the camera), the visible error should be 0 because the triangle
isn't visible.
When splitting triangles in a 3D mesh, care must be taken to ensure that
no cracks or seams appear in the mesh. The ROAM algorithm handles this
by following one simple rule: except at the edges of the map, only a square
can be split. In this case a square is defined as two right triangles
of the same size sharing their longest edge. When you need to split a
triangle whose neighbor along the longest edge is not the same size, you
must split its neighbor first. If the neighbor isn't part of a square,
then its neighbor must be split before that. This check continues recursively
until a square is reached or the edge of the map is reached, and all triangles
along the path are split as the recursion unwinds. The figure below illustrates
how this works:


Figure 1: Splitting triangle T requires multiple splits to avoid creating cracks in the mesh. 
Attempting
to split triangle T requires four new vertices to be added to the mesh.
Those vertices and the new edges created are highlighted in red and numbered
in order. The first vertex makes the bottomright corner of the map into
a square. The second vertex makes a diagonal square in the bottom center.
The third vertex makes triangle T part of a square. Last but not least,
the fourth vertex splits triangle T.
Because the mesh starts with two triangles and each is split recursively
into two smaller triangles, a binary tree is often used to store the ROAM
mesh in memory. In this case, each node in the tree contains a triangle
at a certain level of detail, and each frame all the leaf nodes in the
tree are rendered. Along with containing pointers to its vertices, the
triangle object usually contains pointers to its neighbors and members
dealing with its split priority.
To keep the mesh optimal as the camera moves around, we need to be able
to remove triangles that are no longer needed. The rule for merging triangles
back into their parent triangles is that, except at the edges of the map,
only a diamond can be merged. A diamond is defined as four right triangles
with their 90degree angles sharing the same vertex, which is equates
to a square that has just been split. To merge the diamond, just remove
the center vertex and turn it back into a square. Unlike splitting a triangle
through recursion, you must wait until a triangle becomes part of a diamond
before you can merge it.
In addition, it's not very efficient to figure out if a particular triangle
is part of a diamond and then figure out if that diamond needs to be merged.
The best way to handle this seems to be to maintain a list of current
diamonds at all times. Every split creates 1 diamond and destroys 02
diamonds, and every merge does the opposite. The diamond structure should
contain pointers to its triangles and members dealing with its merge priority.
Every frame merge priorities get checked, and all diamonds that fall below
the split threshold get merged.
One of the best things about ROAM is the amount of control you have over
the performance vs. visual quality tradeoff. You can control it by finetuning
the priority calculations, changing the priority threshold, setting limits
on the frame rate, setting limits on the triangle count, or setting limits
on the number of split/merge operations to perform per frame.
Probably the largest problem with ROAM is that mesh changes from one frame
to the next might cause a visible "pop" in the terrain. This
artifact can be reduced by a process called "geomorphing", where
you gradually move vertices from the old position to the new position
over a number of frames, smoothing the pop and making the transition less
noticeable. You can also slowly change the vertex normal, color, and texture
coordinates if necessary.
Mapping ROAM to
a Sphere
Because the traditional ROAM algorithm only allows you to split a square,
you have to start with a solid 3D shape that consists only of squares.
To prevent cracks from forming in the mesh, the edges of these squares
must fit together seamlessly in all directions. Each square in this shape
will be treated like a toplevel square, and each triangle like the root
of a separate binary tree. Neighbor pointers have to be set up between
these toplevel triangles properly so that splitting at the edge of one
toplevel square can split the corresponding triangle on the adjacent
edge.
The obvious 3D shape that consists of all squares with edges wrapping
seamlessly in all directions is a cube. Because a subdivided cube doesn't
give a wellproportioned sphere, I spent a lot of time trying to come
up with a more appropriate shape, but I couldn't find anything that fit
the requirements. Later on, I realized that I was wasting my time because
the split priorities would quickly take care of the sphere's proportions.
If there is too much error in one triangle because its longest edge is
out of proportion, it will be split.
Once you set up the 12 root triangles of the cube with the proper neighbor pointers, everything falls into place and the traditional ROAM algorithm works beautifully. In fact, it's even easier to implement than the flat ROAM algorithm because you don't have to test for conditions at the edges of the map. The only thing you need to treat differently is how you implement your height values. Instead of simply adjusting a vertex's y coordinate, you need to treat your height value as an offset from the sphere's radius. Mathematically, treat each new vertex as a direction vector from the center of the sphere. Normalize the vector and multiply it by the sphere's radius plus the height offset.



One other simplification I made to the ROAM algorithm was to use a linked
list of triangles instead of a binary tree. The tree structure takes up
more memory, slows the routine down, and is more complicated to implement.
A linked list is sufficient as long as each triangle keeps a pointer to
its edge neighbors and the triangle it was split from (its parent). The
list is initialized with the 12 toplevel triangles, which have their
edge pointers set up properly and their parent pointers set to NULL. When
you split a square, add two new triangles to the list with their parent
pointers set to the two existing triangles, then add a vertex and rearrange
the vertices and edge pointers of the four triangles. Do the reverse when
merging a diamond.
To simplify the split and merge operations, it is a good idea to maintain
the same order of the vertices and neighbor pointers in the triangle object.
I chose to order my vertices (0, 1, 2) counterclockwise with vertex 1
always being opposite the longest edge. My edge neighbor pointers are
ordered based on the edge shared, 01, 12, and 20 respectively. Since
vertex 1 is opposite the longest edge, this means that the last edge pointer
(i.e. array index 2) always shares the longest edge.



Note from Figure 3 that it's not that hard to determine whether a triangle is part of a square or diamond using edge pointers. If triangle1>edge[2]>edge[2] = triangle1, then triangle1 is part of a square. Otherwise, you need to split triangle1>edge[2] to make triangle1 part of a square. If triangle1>edge[0]>edge[0]>edge[0]>edge[0] = triangle1 (you can also replace those 0's with 1's), then the triangle is part of a diamond.
Optimizing
the Update() function
Now that you have the basic idea behind my spherical ROAM algorithm, you
need to know how to optimize it for a dynamic planet generation algorithm
like the one I explained in Part One. The first key to making it fast
enough for realtime rendering is to realize that the fractal function
you're calling is very slow. Even if you change it to use some other type
of algorithm, it is relatively safe to assume that it won't be a cheap
function to call. Your only hope to get it to run fast enough is to minimize
the number of calls to that function.
Keep in mind that you have to update the priority of every triangle in
the mesh each frame to determine whether it needs to be split or not.
That priority depends on the height of the new vertex, which is determined
by calling the fractal function. So to keep from having to call the fractal
function for every triangle every frame, you need cache the new height
value in your triangle object. So as each triangle is created from a split,
you should call the fractal function to get the height value for the next
potential split.
Once I had that implemented, I realized I was still calling the fractal
routine twice as often as necessary. Because each triangle shares its
longest edge with another triangle in a square, the second triangle to
be created for that square was making the same call to the fractal routine.
It's easy enough to determine whether a triangle is completing a square
at the time of its creation, and if so borrow the calculated height value
from its neighbor. Also note that there is also no need to call the fractal
routine when you merge a diamond because the vertex you are removing has
the information you need. Just take the vertex's height value and use
it to compute the offset to put into the merged triangle objects.
Next I focused on the priority calculations. The first thing to keep in
mind is that if the merge priority is not calculated in the same way the
split priority is calculated for a triangle's parent, you quickly get
into a situation where each frame you will split several triangles only
to have them merged together again right away. You can also get into a
situation where unneeded triangles are kept around long after they're
needed. Because I'm not using a binary tree, I'm not keeping track of
the original state of the parent triangles. This was easy enough to solve
by adding a few members to my diamond object to keep track of the necessary
information.
There are a number of other changes I can think of off the top of my head
that could theoretically speed it up even more. The simplest way is to
come up with some logic to avoid doing a priority check on every triangle
every frame. At 50,000 triangles, this can be quite a burden on the CPU.
I recently read an interesting article that described an improved split
priority concept. In this article, the priority calculations were precalculated
to determine the distance at which a triangle would be split or a diamond
merged, and updates were driven mainly off of distance checks. The author
then grouped polygons together into a tree using bounding spheres to greatly
reduce the number of distance checks that needed to be made each frame.
I'm not clear on the details surrounding their implementation, but they
reported a dramatic improvement in their Update()
function.
Optimizing the Draw()
function
Next I concentrated on the Draw()
function, and switched from direct calls to glVertex,
glNormal, and glTexCoord
to a vertex array. I created a dynamically sizing array class, and made
it global so that all ROAM objects use the same vertex array. This optimization
gave a noticeable boost in frame rate, which illustrated to me that my
optimizations to the Update()
function had paid off and the bottleneck had moved over to the
rendering pipeline. If the rest of the code had been poorly optimized,
I would've seen a very small improvement optimizing the rendering pipeline.
Then I looked at the Nvidia documentation to find the fastest way to push
nonstatic triangles to the video card. I found that because I had already
implemented a vertex array, it took very little effort to implement the
GL_NV_vertex_array_range
and GL_NV_fence extensions.
These extensions allow you to copy vertices directly to AGP or video memory,
and when you render with the standard glDrawElements
function, it renders them extremely quickly. I believe the effect is the
same as using DirectX's vertex buffers stored in video memory.
Given all these optimizations, I get about 20 FPS rendering 50,000 triangles
using a 750 MHz Duron with a GeForce2 MX video card. When I toggle the
mesh updates (i.e. skipping the Update()
call) on the same machine, it speeds up to 45 FPS with 50,000 triangles,
so there's still room for improvement in the update routine. The only
further optimizations I can think of making to the Draw()
function would be to see if I could convert my triangle lists to a set
of strips or fans, but that would require even more changes to the Update()
function to maintain the list of strips and fans. Given the frame rates
I'm currently getting, I can't imagine it would improve performance much
unless I came up with a simple way to maintain them.
Looking at the
Code
I want to go over the classes I wrote for this ROAM algorithm so you can
get a good idea of how they work. To keep it concise, I will only list
and explain key classes, members, and methods here. This is just to explain
some of the finer points of what I did to make it easier for you to go
through the demo source and modify it or come up with your own implementation
based on it.
CROAMVertex:
class CVertex
{
public:
static CVertexArray Array;
CVector m_vPosition;
// The vertex position
CVector m_vNormal; //
The vertex normal
float m_fTexCoord[4]; //
2 sets of texture coordinates
};
The CVertex class doesn't have any ROAMspecific members or methods in
it, it's just a regular vertex with space for 2 sets of texture coordinates
for multitexturing. Note the static CVertexArray object, which is currently
used as a global OpenGL vertex array shared by all ROAM instances. This
needs to be global because I'm only allocating room for 65536 vertices
in video memory (NVidiaspecific), and I need to make sure one ROAM object
doesn't step on another. If necessary I could make a second global array,
perhaps to share nonROAM vertices, and it would allocate its own block
of video memory and work in a similar fashion.
CVertexArray
#define VERTEX_ARRAY_SIZE 65535
class CVertexArray
{
protected:
// Members for the vertex array
unsigned short m_nStackPosition;
// Index into the stack
unsigned short *m_pStack; //
A stack of free elements
unsigned char *m_pUpdated; //
An array of updated flags
CVertex *m_pBuffer;
// Contains the actual vertices
//
Members for GL_NV_vertex_array_range and GL_NV_fence extensions
bool m_bFence; //
true if extensions are enabled
CVertex *m_pArray; //
pointer to allocated video memory
GLuint m_nFence; //
utilize fence synchronization
//
Members for GL_ARB_multitexture extension
int m_nMaxTextureUnits; //
Is multitexturing allowed?
public:
// Init and Cleanup methods
void Init(); //
Initializes extensions, allocates video memory
void Cleanup(); //
Frees video memory
unsigned
short GetElement(); //
Call to get a new array index
void UpdateElement(unsigned short); //
Call after updating a vertex
void ReleaseElement(unsigned short); //
Call to release a vertex
//
OpenGL vertex array methods
void InitRender(); //
Call before rendering each frame
void FinishRender(); //
Call after rendering each frame
int GetMaxTextureUnits();
void EnableVertexArray();
void DisableVertexArray();
void EnableNormalArray();
void DisableNormalArray();
void EnableTextureCoordArray(int, int);
void DisableTextureCoordArray(int);
};
The CVertexArray class
implements the OpenGL vertex array I use for my ROAM algorithm. To properly
encapsulate an OpenGL vertex array, it must handle the GL_NV_vertex_array_range
and GL_NV_fence extensions,
and it must contain wrappers to call functions like glVertexPointer(),
glNormalPointer(), and
glTexCoordPointer().
As you can see here, it keeps a stack of free array indexes into the vertex
array. When the ROAM split function needs to add a new vertex to its mesh,
it calls GetElement()
to get a new index into the array. When the ROAM merge function needs
to remove a vertex from its mesh, it calls ReleaseElement()
to put that index back on the free stack.
CROAMTriangle
class CROAMTriangle : public CListNode
{
public:
// Triangle members
unsigned short m_nVertex[3];
// Indices into CVertex::Array
CROAMTriangle *m_pEdge[3]; //
Pointers to edge neighbors
CROAMTriangle *m_pParent; //
Pointer to the parent triangle
CROAMDiamond *m_pDiamond; //
Set if this triangle is in a diamond
//
Split priority members
CVector m_vMidpoint; //
The midpoint of the longest edge
float m_fOffset; //
The offset in height
float m_fLength; //
The length of the longest edge
unsigned char m_nFlags;
float
GetPriority(CVector &vPosition, CVector &vHeading, float fHorizon);
void Draw(GLenum nMode, unsigned short *nArray,
unsigned short &nIndex);
};
This class requires a little more explanation. As I mentioned above, my
ROAM implementation uses a linked list of triangles instead of a binary
tree. The list is initialized with the 12 toplevel triangles, each with
its parent pointer set to NULL. When a square is split I create two new
triangles and set their parent pointers to the two existing triangles
in the square, then I rearrange the vertices and edges of all four triangles.
This means that the true parent triangle doesn't really exist in the list
anymore. But as long as I have a pointer linking a triangle with the triangle
it was split from, rebuilding the parent is easy when it needs to be merged.
Every time I create a new triangle during a split operation, I cache the
midpoint and length of its longest edge (the edge to be split). I also
cache the offset from that midpoint to the vertex's correct location (i.e.
the error). If the new triangle is completing a square, then the values
are copied from its neighbor. Otherwise, the values are calculated. All
three values are used in the priority calculation, which also takes into
account the camera position and heading and the distance to the horizon.
Initially the GetPriority()
method returned 0 if the distance to the vertex was greater than the horizon
or it was out of the current angle of view, but this caused problems.
Essentially, if you turned the camera away from a planet while close to
it, all polygons would be merged and the sphere would be turned into a
cube. When you turned the camera back to face the planet, the cube would
not always be split because the vertices checked were so far away from
the camera that they were either beyond the horizon or outside the angle
of view. So now a priority is always calculated if the camera is closer
to the vertex than the length of the edge to be split.
The Draw() method simply
fills in an array of indices for a future call to
glDrawElements(). I've toyed with adding a check to skip drawing
triangles that aren't in the current view, but I've got that commented
out right now so I can turn off updates from a specific viewpoint, then
fly around the planet in wireframe mode to see how well the mesh is optimized.
You may get a small improvement in performance by uncommenting the lines
dealing with the TriangleDrawMask
flag.
CROAMDiamond
class CROAMDiamond : public CListNode
{
public:
// Triangles that make up the diamond
CROAMTriangle *m_pParent[2];
CROAMTriangle *m_pChild[2];
//
Merge priority members
CVector m_vMidpoint; //
The midpoint of the longest edge
float m_fOffset; //
The offset in height
float m_fLength; //
The length of the longest edge
unsigned char m_nFlags;
float
GetPriority(CVector &vPosition, CVector &vHeading, float fHorizon);
};
This class is a good bit simpler than CROAMTriangle,
but it is just as important. A diamond is made up of four triangles. Because
two of them are always parents of the other two, the parent and child
pointers are kept separately to make the merge routine simpler. Note how
the priority members stored in CROAMTriangle
are duplicated in this class. These values are what would've been cached
in the original parent triangle that gets destroyed during a split. Caching
them in CROAMDiamond
allows us to calculate a merge priority that matches the parent's split
priority exactly.
Note that the diamonds are in a linked list, each contains a pointer to
its 4 triangles, and a triangle contains a pointer to the diamond its
currently in. This is pretty redundant, but it is necessary to get good
performance. When splitting a triangle, you need to be able to find and
destroy the diamond it's in (if any) very quickly. I used to scan the
list of diamonds, thinking that the list would stay rather small, but
this killed my performance when a lot of splits needed to be done each
frame.
CROAMSphere
class CROAMSphere : public C3DObject
{
protected:
CROAMTriangleList m_llTriangle; //
List of triangles in the mesh
CROAMDiamondList m_llDiamond; //
List of diamonds in the mesh
CROAMAlgorithm *m_pAlgorithm; //
A planet algorithm
int m_nTriangles; //
The current triangle count
void
Init(CROAMAlgorithm *pAlgorithm);
bool CollisionCheck(C3DBase *pCamera);
void Split(CROAMTriangle *pTriangle);
void Merge(CROAMDiamond *pDiamond);
bool Update(C3DBase *pCamera, float fMaxError);
void Draw(C3DBase *pCamera, bool bTexture=true);
};
We finally get to the main ROAM class. The key member variables should be
about what you expected at this point: a triangle list, a diamond list,
and an algorithm object for retrieving the height and texture coordinates
of new vertices. The Init()
method takes an initialized algorithm object, and builds the 12 toplevel
triangles of the cube. The CollisionCheck()
method calls the fractal function to determine whether the camera's position
is inside the planet or not. This is a faster and more accurate way to do
collision detection than to check against the current mesh.
The Split() method
looks messy right now and could probably be cleaned up a bit, but it is
fairly easy to explain with some pseudocode:
void CROAMSphere::Split(CROAMTriangle *pTriangle)
{
if(pTriangle>neighbor does not make a square with pTriangle)
Split(pTriangle>neighbor)
pOpposite = pTriangle>neighborpVertex = CVertex::Array.GetNewElement()
Initialize pVertex's position, normal, and texture coordinatesCreate a new triangle and add it to the triangle list
Split pTriangle into the new triangle
Create a new triangle and add it to the triangle list
Split pOpposite into the new triangleUpdate the pVertex's normal based on the new triangles
if(pTriangle was part of a diamond)
delete pTriangle>m_pDiamond;
if(pOpposite was part of a diamond)
delete pOpposite>m_pDiamond;
Add the new diamond created by the new split
}
The Merge() method is almost as messy right now, but it is also fairly easy
to explain with some pseudocode:
void CROAMSphere::Merge(CROAMDiamond *pDiamond)
{
Merge parent1 and child1 into parent1
Merge parent2 and child2 into parent2
Delete pDiamond
if(parent1 forms a new diamond)
Add
the new diamond to the diamond list
if(parent2 forms a new diamond)
Add
the new diamond to the diamond list
Release the vertex we just merged back to
CROAMVertex::Array
}
The Update() method takes
a camera position and an error threshold from the game engine. It loops
through all diamonds and triangles, checking their priority and merging
or splitting when they cross the error threshold. Recently, I added a wait
state for triangles and diamonds so that if they're not close to the threshold,
they won't be checked every frame. This provided a decent performance improvement,
but I'm sure there's a better way to do this.
Originally, the Draw() method
did nothing but render the polygons in the mesh. Then I switched to an OpenGL
vertex array, building an array of indices to call glDrawElements().
The first time I tried this, I ran into problems trying to draw more than
20,000 triangles. I had read that you couldn't have more than 64K vertices
in your array, but I wasn't anywhere close. I figured out that most video
cards also won't allow you to pass any more than 64K indices to glDrawElements().
So I changed my Draw()
function to build an array of indices separately, and wrote a loop to send
them to glDrawElements() in
blocks of 60,000.
Final
Notes
Now that you've seen how to generate and render a fullsize planet at
realtime speed with dynamic level of detail, I encourage you to play
around with the source code and see what you can come up with. It should
run pretty well on most systems with OpenGLaccelerated drivers, and it
should fly on NVidia GeForce cards. Because it only generates and uses
memory for vertices the camera is close to, it should be fairly easy to
expand this demo to generate and render an entire solar system, or an
entire galaxy of star systems, without a significant impact on memory
or the CPU. The next article in the series will focus on doing that and
will explain how to fix some of the problems you will run into working
with such a large game world.
______________________________________________________