Low Latency Rendering Acceleration With Multiple Graphics Pipes
Many approaches to accelerate the rendering of massive models have been explored in the literature. One widely exploited technique is the use of image based impostors which stand in for large amounts of geometry. Billboards [ref], sprites [ref], portal textures [Alia97], and layered depth images (LDIs) [Shade99] are all examples of such techniques. Another large body of research exists on the generation and use of simplified geometric levels of detail (LODs) [ref]. But these techniques typically require a great deal of preprocessing which on a large model can take hours to perform and require gigabytes of additional secondary storage [e.g. Alia99, Alia99A], most of which will not be used if the interactive user does not happen to visit that particular region of the model. Even with the hours of preprocessing, the frame rate or latency involved in rendering the simplified model may not be appropriate for a viewing environment such as a head-tracked head-mounted display (HMD) which requires both high frame rates (~20 FPS) and low latency (~50 msec).
SGI's Onyx2 Reality Monster with the DPLEX hardware option enables the multiplexing of frames rendered on different graphics pipes onto a single video output device. Pipelining frames in this way increases frame rate, but does not improve latency.
The works to which the present bears the strongest resemblance are the
post-rendering warping presented in [Mark99]
and the visibility server of [Rask97].
Both discuss ways to use a powerful graphics server to accelerate the rendering
on a weak client across a TCP/IP network. This work adapts concepts
from both for use in a highly connected symmetric processing environment
where a client is just as powerful as a server and a very high-bandwidth,
low-latency shared memory connects the two.
Each gray box represents a separate process. The Selection Strategy resides in an address space
shared by all of them.
Both acceleration techniques developed in this research fit nicely into the system architecture described above. They differ only in the type of reference view passed from servers to client. The following sections describe the techniques.
This approach suffers from all of the same problems that other image based techniques do, namely artifacts from disocclusion and surface undersampling. Approaches taken to alleviate the disocclusion artifacts are discussed in Strategy Choices. The surface undersampling problem is currently handled with fixed-size splats for simplicity.
Using parallel pipes to precompute visibility has some enormous advantages. For one, surface reconstruction is easier than with point samples since most of the time two different polygons sampled by adjacent pixels will actually be adjacent polygons. That means the surface does not disintegrate as the user zooms in closer to the surface than the reference camera. Another advantage is compactness: often a polygon will cover multiple pixels in the reference image, so one polygon identifier can often serve to represent many pixels. Fortunately, on the flip side, when one pixel covers multiple polygons only one of the polygons will be put into the list, so complexity is still capped to the size of the reference grid just as it is with image based techniques. A hybrid reference view that uses point samples to maintain coverage in those cases would be interesting. Finally since the interactive graphics pipe will be rendering actual polygons from the model, it can render the scene with realistic view-dependent lighting. This is possible with point samples only if a normal buffer is added to the color and depth buffers already being transferred. A full precision normal buffer, however, would require as much storage as three depth buffers.
In the current implementation, each reference pipe renders the scene using triangle-IDs encoded into the RGB color channels. After rendering the false color image, the challenge is to quickly generate a compact list of unique IDs out of the highly redundant information in the color buffer. The current solution is to scan through the color buffer hashing each ID into a list to remove redundancies, and then scanning through the resulting hash table to compact the list. Only then is the list handed off to the interactive renderer. It is essential that the reference renderer receives the most easily digestible list possible, since this directly impact the interactive frame rate. Note that each reference pipe runs on a separate processor, so CPU time spent compacting the list does not take away from time available to the interactive renderer.
The hash and compact technique works well; however, it does not allow for smooth integration of multiple reference renderers. It is critical to be able to combine the ID lists from multiple pipes, and ideally the cost of such a merging process would not depend on the number of pipes contributing IDs. It is possible to simply let all the processes hash into the same list simultaneously. Although this can result in some duplicates in the hash table from race conditions, the number of duplicates will be relatively small and there is no real harm in accidentally rendering a few triangles multiple times. However, at some point triangles which are no longer in the user's view must be cleared out, otherwise they will just continue to accumulate until the exceed the interactive pipe's ability to render them. One could simply clear the hash table periodically, but this would result in an unnecessary loss of recently viewed polygons. Locality of reference applies here: a polygon recently viewed is likely to be viewed again, so as long as the interactive pipe can handle the polygon load, there is no reason to clear out a recently viewed polygon. The solution developed here is to associate a timestamp with each ID in the hash table, and then instead of completely clearing the table, it is pruned periodically according to an LRU rule. Exactly when this pruning occurs can be based on the current frame rate in the interactive renderer. When the frame rate dips below a specified threshold, the pruning kicks in, which boosts the frame rate back up to acceptable levels.
The polygon ID rendering acceleration technique yields very promising results; nevertheless, it is not without its share of problems. One problem is undersampling. When the scene contains many small polygons it is very easy for some of them to entirely miss the pixel grid in the reference renderers, especially when the small polygons are nearly edge-on to the viewer. Using the accumulate-and-prune technique just described tends to eventually fill in the missing polygons, but still some polygons can stay missing for the time it takes to renderer several reference frames.
We have implemented several of the ideas mentioned above and have plans to implement several more (see Future Work).
The test bed developed for experimenting with strategies contains a number of parameters that are common to all and can be set with the GUI, shown below.
The key parameters are the
One observation is that off-center references are not visible much of
the time, yet they cost just as much to render as the one at the center
of attention. They are rendered speculatively in case the user suddenly
decides to steer the walkthrough in that direction. In order to speed
up the interactive frame rate, however, we can optimize for the common
case and render a lower resolution reference image for the off-center angles.
If we only wish to use two reference images, then we can point them both
in the same direction, but render one with a lower resolution and wider
FOV. The image below demonstrates that technique.
The viewer was turning right as this image was captured, so a band of
lower resolution samples can be seen along the right edge.
The blocky pixels at the top of the window are an artifact of the
technique used to composite the low and high resolution reference views.
We still get the edge coverage of a large reference FOV, without taking
the hit of rendering 2-3 full resolution point sets. To composite
the low resolution and high resolution images - which cover many of the
same pixels - we give preference to high resolution samples by using the
stencil buffer. This works well generally, but causes some problems
with detailed elements that appear in front of the background, as is the
case with the window in the above image. In the example above the
problem can be remedied by adding a backdrop image behind the window. This
is also desirable for increasing the realism of the experience. A
promising variation which would eliminate all "edge of the world" phenomena
would be to take the wide FOV to its extreme and use a low resolution environment
map pasted onto the sides of a cube to provide coverage of the entire solid
angle surrounding the user.
In this work, references can be generated by multiple pipes, so the Selection Strategy can speculatively render a range of potential future viewing directions and select the best of the bunch when the time comes. This is not necessarily an efficient use of the parallel hardware, but at least with a real headtracked viewer the number of potential future viewing directions is fairly limited and not dependent on the problem size. A small number (less than 10) reference renderers should give good results regardless of the model size.
The tests were conducted on a radiositized model of a few rooms of Dr. Brook's house which was instanced three times for a total of 390K triangles. The model was stored in a flat triangle list and could be rendered without any culling at about 4 frames per second (FPS). This base frame rate is both the base against which to compare acceleration, and also the rate at which a reference pipe can deliver new reference frames.
Using the point-based technique the rendering time depends upon the number of points displayed, which works against the fan strategy. Point rendering should be much faster than triangle rendering, since it requires no edge setup, but unfortunately current hardware is not well optimized for point rendering. Storing the colors for the points in a texture and accessing them via projective texturing almost doubles the frame rate. A grid of 512x512 points colored with projective texturing can be rendered at 17FPS. With two 512x512 references, 10FPS. The combined low-res/high-res technique used one 512x512 reference and one 128x128, and rendered at ~15FPS.
Using the polygon-ID-based technique yielded frame rates as high as 60FPS, probably capped by the display's refresh rate. Excellent results were obtained by setting the minimal frame rate to trigger pruning at about 20FPS. On a 640x480 stereo display (1280x480 total framebuffer), the interactive pipe was able to run at a consistent 30FPS.
While both techniques look promising, for models with significantly less than 1 polygon per pixel in an average view, and until point rendering hardware catches up to triangle hardware, rendering polygon IDs is probably the best choice.
[Rask97] R. Raskar, "Visibility Culling Using Frame Coherence", UNC Course Project, May 1997. (link withheld by author's request)
[DPLEX] Mark Schwenden, "Onyx2 (TM) DPLEX Option Hardware User's Guide", Document Number 007-3849-001, Silicon Graphics, Inc., 1999.
[Alia99] Daniel G. Aliaga, Anselmo A. Lastra, "Automatic Image Placement to Provide a Guaranteed Frame Rate", to appear in SIGGRAPH '99, 1999.
[Alia99A] D. Aliaga, J. Cohen, A. Wilson, E. Baker, H. Zhang, C. Erikson, K. Hoff, T. Hudson, W. Stuerzlinger, R. Bastos, M. Whitton, F. Brooks, D. Manocha, "MMR: An Integrated Massive Model Rendering System Using Geometric and Image-Based Acceleration", Symposium on Interactive 3D Graphics (I3D), April, 1999.
[Shade99] J. Shade, Gortler, He, Szeliski, "Layered Depth Images", Proceedings of SIGGRPAH '98.
[Alia97] Daniel G. Aliaga, Anselmo A. Lastra, "Architectural Walkthroughs Using Portal Textures", IEEE Visualization '97, pp. 355-362, Oct 19-24, 1997.
Matthew M. Rafferty, Daniel G. Aliaga, Voicu Popescu, Anselmo A. Lastra, "Images for Accelerating Architectural Walkthroughs", Computer Graphics & Applications, November/December, 1998.
Voicu Popescu, Anselmo Lastra, Daniel Aliaga, Manuel Oliveira Neto, "Efficient Warping for Architectural Walkthroughs using Layered Depth Images", IEEE Visualization '98, October 18-23, 1998.
Matthew M. Rafferty, Daniel G. Aliaga, Anselmo A. Lastra, "3D Image Warping in Architectural Walkthroughs", VRAIS '98, pp. 228-233, March 14-18, 1998.
Daniel G. Aliaga, Anselmo A. Lastra, "Smooth Transitions in Texture-based Simplification", Computer & Graphics, Elsevier Science, Vol 22:1, pp. 71-81, 1998.
Daniel G. Aliaga, "Visualization of Complex Models Using Dynamic Texture-based Simplification", IEEE Visualization '96, pp. 101-106, Oct 27-Nov 1, 1996.
D. Aliaga, J. Cohen, A. Wilson, H. Zhang, C. Erikson, K. Hoff, T. Hudson, W. Stuerzlinger, E. Baker, R. Bastos, M. Whitton, F. Brooks, D. Manocha, "A Framework for the Real-time Walkthrough of Massive Models", UNC TR# 98-013, March, 1998.
D. Aliaga, "Automatically
Reducing and Bounding Geometric Complexity by Using Images", Dissertation,Computer
Science, UNC-Chapel Hill, October 1998.