Occlusion Culling for Walkthroughs of Large Virtual Environments

William V. Baxter III
Department of Computer Science
University of North Carolina at Chapel Hill
May 8, 2000



Large complex CAD models exceed the capabilities of current graphics hardware to render them directly at interactive frame rates. Techniques that reduce the number of polygons rendered by the hardware are necessary to achieve interactive performance. All polygon reduction techniques can be summed up by one phrase "don't draw what you can't see". Not drawing what isn't seen can be split into two main categories: (a) not drawing detail which can't be seen and (b) not drawing things which can't be seen.

The first category of polygon reduction strategies is epitomized by level-of-detail algorithms. These techniques compute a series of geometric approximations to objects in the scene with progressively less detail, and hence progressively fewer polygons. The levels-of-detail generally must be generated as a preprocess since the algorithms for computing them are rather time consuming. Levels of detail can be either static -- a fixed set of approximations; or continuous -- each simplification operation is stored and the best approximation is computed each frame. The end goal is the same, each object in the scene is rendered using an approximation with as few polygons as necessary to capture the detail visible from a given distance. When frame speed is paramount, level of detail algorithms can easily accomodate by providing coarse approximations that have less detail than would be visible from the current viewpoint. Thus LOD's can offer both essentially 'lossless' polygon reduction and varying degrees of 'lossy' reduction.

LOD techniques do a good job of rendering objects using the fewest number of polygons possible, but not all objects always need to be rendered. Culling techniques are used to determine which objects do not need to be rendered at all, and they represent the second large category of polygon reduction technique. There are many types of culling, but the most widely used thanks to its simplicity and effectiveness is view frustum culling. The idea of view frustum culling is simply to not render what is outside of the viewing volume defined by the eyepoint and window. The name comes from the shape of the viewing volume, a truncated pyramid. View frustum culling is most effective when combined with a scene organized into a spatial hierarchy so that entire branches of the scene graph tree can be culled with one in-out test.

Another form of culling is occlusion culling. Occlusion culling algorithms attempt to determine what objects aren't visible because they are behind other objects. Sevaral popular occlusion culling algorithms exist that take advantage of the restricted geometry in a scene. For instance the cells and portals technique is good for indoor architectural models. The key to its success is that the walls of a room occlude most geometry outside the room, or cell. Other rooms are only visible through small doorways or windows, also known as portals. Often it is only possible to see to the next room through a doorway, and the room after that is not visible at all. This is especially the case if the cells are part of a game level specifically designed to render quickly. Thus it generally suffices to render only the polygons in the current room and the ones immediately adjacent; the rest are culled out since they are known to be occluded.

Occlusion culling for general scenes is a difficult problem. To perform occlusion culling in general we would like to know which objects block many other objects. For an object to be culled it must be occluded completely, but often complete occlusion is achieved only through the combined contributions of two or more occluding objects. Clearly some form of aggregate occlusion representation which combines these contributions is desirable. We could try to find all objects which occlude other objects, but this is basically the same as asking which objects are visible, and if we knew that then we would just render those objects. There are a number of ways to combat this problem and the remainder of this paper will disscuss them and the implementation of the rest of an occlusion system.

First an aside on image based algorithms. Image based algorithms for polygon reduction are another popular topic of research. They are difficult to classify in the above taxonomy. They attempt to represent portions of the scene using images or sprites that contain as much detail as necessary to appear correct, but the images are also used to replace or cull out the geometry they represent. Thus they attempt to eliminate sub-pixel detail, while at the same time performing occlusion culling. An example of such an algorithm is image warping used to cover portals in a cells-and-portals type of system [Popescu98, Rafferty98]. The geometry beyond the portal is represented by (and culled by) the warped image.

Visibility culling. Traditional hidden surface removal and "tiling" algorithms: warnock et al, Greene's triage masks


Occlusion culling is an attractive area for rendering acceleration because even after simplifying geometry with LODs and performing view frustum culling there is often still too much geometry for modern hardware to render. The problem is that scenes often have very high depth complexity, that is, many layers of geometry underneath each pixel. Modern hardware just renders all the layers in arbitrary order and uses the Z-buffer to determine which layer is visible on a per-pixel basis. Clearly every layer that is rendered which is not visible in the final image represents wasted effort. Occlusion culling aims to reduce the depth complexity that leads to this wasted effort. Effeciently achieving a depth complexity of 1 or less per pixel is the holy grail of rendering acceleration.

More about HOM/HZB methods, their specific occlustion tests. Trade offs. Hybrid ideas.


Mipmap building (alpha and depth), occlusion testing (alpha and depth)

Building Alpha Mipmaps

When using the alpha channel as the occlusion representation, the image hierarchy is built by repeated averaging operations. Level 0 is the finest level, and each higher level is generated by averaging groups of four pixels together, resulting in an image that is half the width and half the height of the previous level. Standard graphics hardware for bilinear filtered texture mapping can be made to do exactly this, given appropriate texture and polygon coordinates. The trick is to render a quad textured with the original image but at half the size of the original images. Use a glOrtho projection whose resolution matches that of the rendering window exactly. When the quad is rendered with bilinear texture mapping turned on the resulting image will be the next level of the mip map. (Q: How does gluBuild2DMipmaps do it?) [Zhang98] gives timing data that indicates this is only a win for building the larger levels of the image pyramid. Each step requires rendering a textured quad, copying the color buffer to main memory for later use, and copying the color buffer into texture memory to texture the next pass. All this copying can cost more than just doing the averaging on the host CPU. The solution is to only use the hardware for the first few levels of averaging.

First you set texturing to be bilinear:


Building Depth Mipmaps

For hierarchical Z-buffer occlusion culling, the image pyramid is built from the depth buffer by repeated max operations. Each pixel of the an image at a particular level is the maximum value of the four pixels below it in the pyramid. General OpenGL hardware is not able to help much in generating this type of mipmap, but high end SGI workstations implement a min-max blending extension that can be used.

Method 1

Level n+1 is generated from level n as follows (say the depth values are stored in a w by h array of floats):

First, make sure that all texture filtering is off:

and take care that w and h are powers of 2. Pad the image if this is not the case.
  for (int pass=0; pass<4; pass++)
     glPixelStorei(GL_UNPACK_ROW_LENGTH, w);
     int offset = (pass & 0x1) ? 1 : 0;
     offset += (pass & 0x2) ? w : 0;
		  0,              // mipmap level
		  2,              // number of color components (2==GL_LUMINANCE_ALPHA)
		  w/2, h/2,       // size of target image
		  GL_LUMINANCE,   // format
		  GL_FLOAT,       // data type
		  depth_image + offset);

     Render quad at image resoulution
  glReadPixels to get new mipmap level into host memory
  glBlendEquationEXT(GL_FUNC_ADD_EXT);  // restore standard blend mode  
The key trick is getting the stride through the data right. This is accomplished by pretending we have two color components per pixel (so stride through a row will be 2) and that the width of a row in host memory is w of these 2 component pixels. In fact it is w/2 of the two component pixels, but by telling GL_UNPACK_ROW_LENGTH it is w while we tell glTexImage2D it is w/2 we manage to skip every other row.

Compared with the method for building alpha mipmaps, this is a bit more expensive since each pass requires 4 calls to glTeximage2D and 4 textured quads rather than just one. And since depth is typically a 4-byte float while alpha is just a byte, the host to texture memory bandwith increases with the increased component size, so it goes up by a factor of 4 compared to alpha mipmapping. Hence the break-even point where a software implementation begins to beat hardware will probably occur a few levels earlier in the process.

Since this method does not seem really to use any special features of texture mapping hardware, one might ask why not just use glDrawPixels. The answer is that glDrawPixels doesn't support drawing a different number of components from what is in the source image. Thus while we can still skip rows by using GL_UNPACK_ROW_LENGTH we cannot skip pixels within a row.

It might be the case that we would rather not do our hierarchical Z-buffer comparisons against the full resolution depth map. In that case we do not need the intermediate levels and so can do N^2 passes instead of 4 passes, where N is the reduction factor in width and height desired.

Method 2

We can get away with just one call to glTexImage2D by realizing that the desired stride through the data can be achieved simply by doing a poor sampling of the mip map under carefully controlled rendering conditions. Then the algorithm for generating level n+1 given level n becomes:
	       0,              // mipmap level
	       GL_LUMINANCE,   // format in host memory
	       w, h,           // size of original image
	       GL_LUMINANCE,   // format for texture memory
	       GL_FLOAT,       // data type
  glPixelStorei(GL_UNPACK_ROW_LENGTH, w);
  for (int pass=0; pass<4; pass++)
     GLfloat offsetx = (pass & 0x1) ? -0.5 : 0;
     GLfloat offsety = (pass & 0x2) ? -0.5 : 0;
     Render quad at w/2 x h/2 resoulution, at coordinate (offsetx,offsety)
  glReadPixels to get new mipmap level into host memory
  glBlendEquationEXT(GL_FUNC_ADD_EXT);  // restore standard blend mode  
Essentially we do a subpixel jittering of the quad so that for each rendering a unique texel of the texture map is applied to each pixel. In this case we actually benefit from the lack of antialiasing and the zero order reconstruction which are typically associated with poor image quality.

Alpha Occlusion Testing

The standard way to perform alpha overlap tests is to compare simple screen-aligned bounding rectangles with the hierarchical alpha buffer. The bounding rectangle is the smallest axis aligned rectangle that will fit a scene graph node's projected bounding volume. As such the test is often quite conservative, possibly testing a rectangle well larger than is really necessary. The standard comparison technique traverses the image pyramid from the coarsest level down to the finest level until it determines that a given bounding volume is either definitely visible, definitely occluded or somewhere in between. A hardware based overlap detection mechanism will no longer require an image hierarchy, and it will have the added benefit of testing exactly the projected bounding volume rather than the bounding rectangle of the projected bounding voulme. The proposed methods just use the hardware to test all pixels in the finest image directly.

The basic idea is that we would like to draw the occlusion map, then somehow render the bounding box so that we can instantly tell if any pixel was "updated".

Method 1

The SGI Histogram extensions basically allow us to quickly find out something about the distribution of colors involved in a set of pixel operations. This can be exploited to give us an indication of whether part of the bounding volume is visible or not.

First render the occluders in pure black with an alpha channel with all lighting attributes turned off, and no depth buffering. Set the blending mode to (1-alpha) * src + a * dst. Now render the bounding volume with all lighting completely off using only one channel, say blue. Using the histogram extension now, see what the largest blue component is in the resulting image. If there are enough pixels with blue value above a particular threshold then the object represented by the bounding volume must be rendered. Otherwise it may (possibly) be culled. Since high end hardware is typically well optimized for RGBA rendering, it should be possible to quickly test one bounding volume in each color channel for a total of 3 tests before having to "reset" the occlusion alpha channel image by recopying it.

The drawback to this method is that it does not tell us which pixels were visible and which were hidden. Alpha buffer occlusion is only a necessary condition for real occlusion, and not a sufficient one. Some sort of depth comparison is also required to give sufficient grounds for culling a node. Thus we would like to have some idea which regions are hidden according to the alpha buffer so that we can test them against a z buffer.

Q: What happens with pixels overdrawn many times using the given compositing operator? A: For a given channel c=(1-alpha)^n * c1 * c2 ... * cn. But since we draw the occluders first and only scan convert the bounding volumes after that, the solution is simply to only allow convex bounding volumes, and render with backface culling on so the depth complexity of bounding volumes will always be 1.

Method 2

This is essentially the same as Method 1, but everthing is done with Z-buffer turned on, which ideally will solve the problem of determining exact visiblity given the partial visibility information provided by opacity maps

Depth Occlusion Testing

[Greene93] mentions a special kind of graphics hardware that can answer depth visibility queries quicly without explicitly scan converting geometry. Basically it says whether or not a rendering operation would result in any zbuffer value being updated. For Z-buffer testing this is exactly what you would like to know. SGIs do not have such a capability (nor does any other modern hardware that I know of) but again the histogram extension can be used very much as it is for alpha occlusion maps above, when combined with the min-max blending function extension.

Method 1

The simplest idea is to render the occluders in black with only depth, then to turn off depth buffer updates (but not depth testing) and render three bounding volumes in each of pure red, green, and blue with simple additive blending. Then use the histogram extension to read back the results for the three simultaneous tests. Any appearance of any amount of red, green, or blue indicates that the corresponding bounding volume is at least partially visible. The number of pixels gives us an idea of how many pixels are visible, and if the number is low enough we can choose to cull the node out even though it is visible.

The depth values are not changed, so only the color buffer needs to be cleared for the next three node tests. This method does not use the min-max blending extension.

Actually it should be possible to do a node test per bit of color accuracy if logical OR'ing is used for blending. First render with color (0x1,0x0,0x0), then color (0x2,0x0,0x0) etc. with 24 bit color that means 24 bounding volumes can be rendered before needing to read back the histogram and reset the color buffer. The drawback is that interpreting the histogram becomes more difficult. For 8 bit RGB channels we need to compute three 256-entry histograms. Separating the information for the 8 channels combined into each original component is not too difficult: basically OR together each of the 256 buckets values that isn't empty. The resulting 24 bit number shoule be 0 in all bit positions that are occluded objects and 1 in positions that are not.

Method 2

This time we render the depth values of the occlusion map as intensity, and the alpha channel is not important. Setting the blending func to the min extension by glBlendEquationEXT(GL_MIN_EXT), we scan convert the bounding volume of the node in red or some other single channel. This


Uses of multiple procesors, uses of multiple graphics accelerators. Building mipmaps. Occlusion testing. Culling nodes. Multi-phase methods.


[Zhang98] " Effective Occlusion Culling for the Interactive Display of Arbitrary Models", Hansong Zhang, Ph. D. Dissertation , Department of Computer Science, UNC-Chapel Hill, 1998

[Zhang97] "Visibility Culling Using Hierarchical Occlusion Map", Hansong Zhang, Dinesh Manocha, Tom Hudson and Kenny Hoff, Proceedings of SIGGRAPH 1997.

[Greene93] "Hierarchical Z-Buffer Visibility", Ned Greene, Michael Kass, and Gavin Miller, Proceedings of ACM SIGGRAPH, pp. 231-238, 1993

[Popescu98] " Efficient Warping for Architectural Walkthroughs using Layered Depth Images", Voicu Popescu, Anselmo Lastra, Daniel Aliaga, Manuel Oliveira Neto, IEEE Visualization '98, October 18-23, 1998.

[Rafferty98] " 3D Image Warping in Architectural Walkthroughs", Matthew M. Rafferty, Daniel G. Aliaga, Anselmo A. Lastra, VRAIS '98, pp. 228-233, March 14-18, 1998.

Bill Baxter (Send mail)