This article is the fourth in a series, and it assumes that you have already read the first three articles. If you haven't, they can be found on Gamasutra.com. I have also written two articles on atmospheric scattering: one on GameDev.net and the other in GPU Gems 2. Links to all of my online articles can be found on my website at http://sponeil.org/, along with the latest source code and executables for my published demos. The code has been completely re-factored since the last time I published it. All the common code has been pulled out into a shared library, and I borrowed a few concepts from GameDev's Enginuity series of articles.
The planet-rendering algorithm I published with the first three articles was a spherical implementation of ROAM. For a dynamic LOD implementation on a sphere, it was very clean and it created a fairly optimal mesh, but it had a number of drawbacks. Priority and frustum checks were made on every triangle, the vertex/index list was updated every frame, and it was difficult to keep vertex information in video memory. In short, the GPU was under-utilized and the CPU was over-utilized. It also required a lot of memory per triangle, stored no coherent height map anywhere, and had no space partitioning. This made placement of textures and objects on the surface difficult. Newer versions of ROAM may not have these problems, but my implementation did.
A few years ago, I decided to replace ROAM with a chunked quad-tree. Now each node gets its own vertex buffer, priority and frustum checks are done per node, space partitioning and object placement is relatively easy, and it is easy to apply dynamically-generated ground textures and objects. On the other hand, it is a more complicated implementation and it has to create more vertices/triangles to achieve the same level of detail. The extra vertices require extra calls to the fractal function, which hits the CPU very hard. However, there are ways to speed that up, and the extra calls go into building the ground texture, which improves visual quality a lot. Overall, the benefits more than make up for the drawbacks.The Quad-Tree
Two main classes implement the quad-tree, CPlanetaryMap and CPlanetaryMapNode. The node class is the heart of the quad-tree. My implementation breaks each node into four quadrants, each with 16-24 triangles. When more detail is needed in a specific quadrant, it is split and becomes a new node under the parent. Just like the spherical ROAM implementation, there are six top-level square faces forming a cube. Also like the ROAM implementation, a quadrant can only be split one level deeper than any of its neighbors to avoid cracks in the mesh. In this implementation, extra edge triangles are rendered for each quadrant when its neighbor is at a lower level in the tree. Here is a figure illustrating the triangle layout:
|Figure 1. Triangle layout|
The black dots and lines represent vertices and edges created for the top-level node (level 0). The green lines represent edges created at level 1 and the red lines represent edges created at level 2. In this image the level 0 NE quadrant split down to level 1, and then its SW quadrant split down to level 2. The second split forced the level 0 NW and SE quadrants to split to avoid cracks in the mesh. Note how the green lines creep into the last remaining level 0 quadrant, and how the red lines creep into all four of the neighboring level 1 quadrants. These illustrate the extra triangles that are needed at a quadrant's edges to avoid having cracks in the mesh.
It is fairly easy to determine which triangles need to be drawn for each quadrant. Take the level 0 SW quadrant above as an example. First draw the fan of eight triangles in the quadrant's center, which are highlighted in gray. Then draw the edge triangles surrounding the fan. Not counting the green lines, there are two triangles “facing” each edge of the quadrant (N/S/E/W). Where the quadrant's neighbor has been split into a full node (i.e. the north and east neighbors above), the two triangles facing that edge must be split into four triangles.
To account for the extra edge triangles needed when a quadrant's neighbor is split, extra vertices are generated when each node is built. To keep the code from looking even nastier, a fixed 9x9 grid of vertices is created for each node. I prefer to think of it as 8x8 plus 1 on each side because the resolution doubled = 17x17, tripled = 25x25, and quadrupled = 33x33 (i.e. it's always 8x8 * n + 1). The 4 vertices inside the gray fan above are never used, so you could save 16 vertices per node if you didn't care about making the code a bit messier. The height samples for those vertices need to be calculated anyway for the ground texture and normal map, so you wouldn't save much more than a bit of video memory and the bandwidth used to push that data into video memory. If you have bottlenecks there, then this might be a good optimization for you to consider.
The main difficulty in implementing a spherical quad-tree lies in the fact that neighbors need to be checked by direction (N/S/E/W). On a cube there is no way to set up all 6 faces so that the directions map smoothly with each other. It's impossible to set it up so that going N and then S always puts you back in the same node. See the right-most face in figure 2 below for an example. Next, think about the code trying to determine whether that face's NW quadrant's neighbor to the north is split into a full node. When dealing with neighbors in the same cube face, you can look at the north neighbor's SW quadrant. In this specific case, you need to check the north neighbor's SE quadrant (but it can be different for each case).
Figure 2. Those pesky neighbors
Maintaining the neighbor pointers themselves is almost as much of a pain as figuring out which quadrant of each neighbor to check. To help deal with issues like this, I created lookup tables in PlanetaryMapNode.cpp for each of the 6 cube faces, 4 quadrants, and 4 directions. They look like this:
m_nFaceNeighbor[TopFace][LeftEdge] = LeftFace;
m_nFaceNeighbor[LeftFace][TopEdge] = TopFace;
This lets me know that going west from the top cube face takes you to the left cube face, but that you must go north to get back to the top cube face. The code for this is ugly, and I really hate ugly code, but once it's implemented you don't have to worry about it very often. If all else fails, you can also create a position with 3D coordinates a bit to the side of the current node and traverse the tree from the root node(s) to find where it falls.
To separate the node class from the code building the planet, I created a class called CPlanetaryFactory. Each class derived from CPlanetaryFactory adds something different to the planet, and factories can be combined in different ways to build different types of planets. The CPlanetaryMap class keeps a list of factories for each planet, and when a node in the quad-tree needs to be built, each factory's BuildNode() method is called. Just in case the factory creates something that needs to be cleaned up, each factory's DestroyNode() method is called when a node is destroyed.
Each factory object is initialized with a CPropertySet object, which is a simple map of named properties. This property set is currently loaded from a configuration file called Planet_Quad.conf. A custom RTTI implementation is used to allow class names to be specified in the configuration file, and to keep CPlanetaryMap from having to know how to create and initialize each factory class. This should make it easy for you to create your own factory class, load its type class, and then change the config file to use it without having to change CPlanetaryMap or CPlanetaryMapNode. This is a very useful separation which can be extended to load factories from other DLLs using LoadLibrary. That would allow new factories to be added dynamically without changing or recompiling the main project.
Here is a list of the factories that currently exist in the Planet_Quad project. All of these are proof-of-concept factories. They are very simple and not very well optimized for now, so there are plenty of things you can do to improve them.
This factory loops through each sample point in the node, calculates the direction vector from the center of the planet to that sample point, and passes it to a 3D noise function to determine the height for that sample.
This factory is just like CSimpleHeightMapFactory, but it pre-generates the first few octaves of noise into buffers. This takes up extra memory and incurs a small performance hit when the demo is launched, but it speeds up the node building process noticeably. It would also be easy to change this so that the pre-generated buffers were created without using noise-based fractals. You could “grow” the planet using plate tectonics, volcanoes, impacts, and erosion. You could also use artist-generated height maps.
This factory takes the height of each sample and applies a color to it. This factory is very simple, using height as its only criteria. It would be pretty easy to add latitude or other variables into the calculations, or to build a pre-generated map (i.e. if you “grow” your planet).
This factory adds millions, or even billions, of craters to the planetary surface fairly quickly and without using any memory. It does this by creating a “virtual” quad-tree (i.e. it creates the nodes it needs on the fly). To traverse this non-existent tree, instead of going to a child node, the random seed is altered in such a way that each virtual “child node” has a unique repeatable seed (or as close to unique as we can get it). A simple for loop runs through the levels, adjusting the height of each sample when it finds a “hit”. A very fast RNG (Random Number Generator) is important here. This virtual tree can be used for just about any type of special surface feature.
This factory is a very simple test to show that distinct objects can be added to the planet easily. It is just a proof-of-concept. The objects are grass billboards, which are managed in the factory object itself. This helps keep them all together for quick batched rendering, which is important for a large number of small similar objects like grass billboards. For objects that can move around and bump into each other, or for objects where rendering order is important, it would probably be better to manage the objects in the quad-tree.
Dynamically-Generated Ground Textures
Because each node is square and a small height map is generated for each one, it is fairly easy to dynamically generate and apply ground textures. There are a few complications to worry about, and I will explain them here. To start with, here are some constants I have defined in PlanetaryMapNode.h:
#define NODE_WIDTH 8
#define HEIGHT_MAP_SCALE 2
// The surface map will have 9x9 vertices
#define SURFACE_MAP_WIDTH (NODE_WIDTH+1)
#define SURFACE_MAP_COUNT (SURFACE_MAP_WIDTH*SURFACE_MAP_WIDTH)
// The height map will be a multiple of the surface map (i.e. 17X17 or 33x33)
#define HEIGHT_MAP_WIDTH (NODE_WIDTH*HEIGHT_MAP_SCALE+1)
#define HEIGHT_MAP_COUNT (HEIGHT_MAP_WIDTH*HEIGHT_MAP_WIDTH)
// The height map resolution plus a border surrounding it for normal calculations
#define BORDER_MAP_WIDTH (HEIGHT_MAP_WIDTH+2)
#define BORDER_MAP_COUNT (BORDER_MAP_WIDTH*BORDER_MAP_WIDTH)
The surface map size is used for the vertex map, and the height map size is used for the texture and normal maps. The vertex map size is currently fixed at 9x9, and can't be changed without rewriting some of the code used in CPlanetaryNode. The texture/normal map size can be changed by changing HEIGHT_MAP_SCALE. Using a scale of 2, each node gets a 17x17 texture map and a 17x17 normal map. A scale of 4 (33x33) makes the ground textures look very sharp, but it brings the system to a crawl when a lot of nodes need to be split. These are odd sizes for a texture, but keep in mind that the nodes share edges, and the edges of neighboring texture maps must match perfectly to avoid visible seams. The edges of neighboring textures generated from the 17x17 height maps must always match, but re-sampling it down to 16x16 would throw them off.
Fortunately, it is easy to create a 1024x1024 texture (or larger) and treat it like an array of smaller textures. The CTextureArray class handles all of this and makes it relatively painless. It doesn't matter whether the smaller textures inside it are an odd size or not. Odd texture sizes simply cause a little space to be wasted at the edges of the larger parent texture. The CTextureArray class has a method for generating a texture matrix that maps an arbitrary rectangle perfectly to a sub-texture, which makes it easy to set up texture coordinates for each node.
The normal map generation is so similar to the texture map generation that I use the same type of sub-textures for them, and I put them in the same 1024x1024 texture array. The 8-bit RGB values don't allow much precision, but they seem to be sufficient in this case. The only issue we need to worry about is that we need to generate an extra set of height values around the border of the height map to get accurate normals at the edges. It increases the expense of building a node, but without it seams appear at node boundaries.
|Figure 3. Visible seam between levels|
No matter what you do, seams will show up at the boundary between levels in the quad-tree (i.e. between levels 5 and 6). I'm not certain whether anything can make it go away completely because one node will have more detail and its neighbor will have less. However, there are ways to make this less noticeable. First, keep the texture resolution high. The closer each texel is to one pixel in size, the less visible the problem will be. Second, try to avoid very high-frequency changes in the color or normal maps. If either changes a lot from one level to the next, the seam between levels will be much more noticeable than if the changes are smoother. Atmospheric Scattering
The ScatterCPU demo was originally published and explained in a GameDev article. The ScatterGPU demo was originally published and explained in a GPU Gems 2 article. Read the articles for an explanation of these algorithms. ScatterCPU is a more accurate approximation of scattering, but it is slower and eats up too much CPU time. Because the Planet_Quad demo uses the version from ScatterGPU, I'll mention some of the problems you may notice with it:
- GeForce cards produce very noticeable visual artifacts around the Mie scattering. It looks like lines of color discontinuity. At first glance it appears to be a tesselation artifact, but the discontinuity lines do not line up with triangle edges, and increasing the tesselation level does not improve image quality. The developers I worked with from nVidia on GPU Gems 2 couldn't figure out what was causing this.
- The artifacts mentioned above do not show up on Radeon cards. The color gradient looks very smooth on them, but otherwise the colors look a bit off when compared to the GeForce cards.
- Some tesselation artifacts are visible when viewing the sky dome from space. Scattering is exponential but interpolation between vertices is linear. Instead of drawing a full sphere for the sky dome all the time, which will never be completely visible, I should be drawing a wedge optimized for the current viewing angle.
- The ground is too dark near sunrise/sunset. From space, it actually gets dark before the sky above it does, and this looks bad from certain angles. This is because right after the sun sets, the air above the ground can still see the sun (it is not in the shadow of the planet yet), so it is still lit. Secondary scattering is not calculated or approximated (i.e. skylight is not projected onto the ground), so the ground underneath it is dark when the sky is still lit.
- Sunrises and sunsets viewed from space are not as yellow/orange/red as they are in ScatterCPU. I believe this is due to inaccuracies in the scale function for angles greater than 90 degrees. The lookup table in ScatterCPU is fairly accurate. The scale function that attempts to approximate part of it in ScatterGPU diverges sharply when the angle goes above 90 degrees.
The planet demo contains no clouds because I haven't been able to get them working on a planetary scale yet. I want detailed animated volumetric clouds with a climate/weather modeling engine driving them. This seems feasible at relatively small scales, but it has proven to be very difficult on a planetary scale.
GLCloud1 uses a brute-force rendering method. It uses a rectangular 3D grid (96x96x16) of overlapping spherical cloud “cells.” It sorts all nodes and shades them each time the light source or clouds move (press “r” or “m” to make them move). It then sorts all nodes and renders them all each frame in the worst possible way (589,824 calls to glVertex each frame). It looks pretty good and runs fast enough when it is impostored. However, I ran into a ton of problems trying to integrate this with the quad-tree to achieve a dynamic level of detail.
First, the cells had to be made ellipsoids because an individual quad-tree node can be very large. Let's assume we're dealing with a node that is 1,000 km wide and we try to put a 96x96xN grid in each node (which is a ludicrous number of cloud cells when the tree has over a thousand nodes). If N is 1, the smallest overlapping spherical cell would be 20-30 km tall, which would poke into the ground and out into space. You are forced to choose between a ludicrous number of spherical cells or “squashing” them down into very wide and short ellipsoids. Even squashed, the corners of the largest ellipsoids can poke into space because the longest axis of the ellipsoids are straight and the planet's surface curves. To make things worse, the cells from neighboring nodes overlap, making it impossible to render/impostor the cloud cells from each node together. When animating the clouds, either the cells move or the cloud density values shift from one cell to a neighboring cell. Either way, moving density values between nodes at different levels in the quad-tree presented problems.
GLCloud2 was an attempt to cut way back on the number of ellipsoids. It renders two very large ellipsoids and uses a 3D noise texture to add detail. However, 3D texture lookups are expensive, and using larger ellipses required several “slices” to be rendered into each ellipse to provide a solid look and feel. Plus, it is even more expensive to calculate lighting that is less accurate with this implementation. In the end, I didn't feel I had gained much. On the other hand, using a noise texture makes it easier to achieve a per-pixel detail level, and animation effects can be achieved easily by tweaking the texture matrix.
GLCloud3 was an attempt to get around the problems involved with overlapping cloud cells. I decided to try cube-shaped cells that didn't overlap. The algorithm is based on “projected tetrahedra” (Google it). A cube is made up of five tetrahedra, which uses too many triangles for a single cloud cell, so I extended the algorithm to create “projected cubes” without using tetrahedra. The demo draws 1000 cubes spread out a bit so you can make out each individual cube. When put together, they make a large volumetric cube. If I combined this with the 3D noise texture I used in GLCloud2, and impostor the cloud block in each quad-tree node, it might work well enough.
One problem with projected cubes is that my current implementation requires them to be perfectly square. This would work for a game where the “world” is flat, but quad-tree nodes are not square once they've been projected onto a sphere. Near the corners of each top-level cube face, the squares look very squashed. The projected cubes would need to support arbitrary 3D-trapezoidal shapes to match up seamlessly at quad-tree node edges. Another problem is that vertex positions in the cube change every frame when the camera is moving, which means adjusting its position and texture coordinates, and possibly recalculating the lighting for it, every frame. You may be able to avoid calculating the lighting for it by calculating the lighting of the cube's corners, and interpolating for points inside the cube.
In short, clouds are a real pain. Just being volumetric and translucent makes them very difficult. Trying to animate them and fit them into my LOD scheme makes them seem like a nightmare. It might be better to step back and remove clouds from the quad-tree. They should definitely be rendered in their own pass, most likely after everything else is rendered due to their translucency. So far nothing I have been able to dream up will give me the detail I get with GLCloud1, which is what I'd like to achieve.
At this point I'm not sure what else to say. I feel that this is a good start, but I haven't had much time to spend on fleshing it out. There are so many different factories you could create for different types of planets and moons, and I've just barely scratched the surface. If I could get volumetric clouds working, I'd also like to try implementing a gas giant where you could fly down into the swirling clouds. Then I'd work up to a randomly-generated star system, and then a galaxy of random star systems.
The source code is released under the BSD license, so you're free to use any of it in any derivative projects you want to create. If you do use it, please send me an email to let me know what you're using it for. I'll also do what I can to answer any questions you have.