TY - GEN
T1 - Parallel Visualization Algorithms
T2 - Performance and Architectural Implications
AU - Singh, Jaswinder Pal
AU - Gupta, Anoop
AU - Levoy, Marc
N1 - Funding Information:
We thank Takashi Totsuka, Jim Christy, Jason Nieh. Philippe Lacroute. Mancesh Agrawala. and David Ofelt for implementing or helping implement the parallel versions. This research was funded under ARPA Contract No. N00039-9 1-C-0138. Anoop Gupta is also supported by an NSF Presidential Young Investigator Award.
PY - 1994/7
Y1 - 1994/7
N2 - Recently, a new class of scalable, shared-address-space multiprocessors has emerged. Like message-passing machines, these multiprocessors have a distributed interconnection network and physically distributed main memory. However, they provide hardware support for efficient implicit communication through a shared address space, and they automatically exploit temporal locality by caching both local and remote data in a processor's hardware cache. In this article, we show that these architectural characteristics make it much easier to obtain very good speedups on the best known visualization algorithms. Simple and natural parallelizations work very well, the sequential implementations do not have to be fundamentally restructured, and the high degree of temporal locality obviates the need for explicit data distribution and communication management. We demonstrate our claims through parallel versions of three state-of-the-art algorithms: A recent hierarchical radiosity algorithm by Hanrahan et al. (1991), a parallelized ray-casting volume renderer by Levoy (1992), and an optimized ray-tracer by Spach and Pulleyblank (1992). We also discuss a new shear-warp volume rendering algorithm that provides the first demonstration of interactive frame rates for a 256/spl times/256/spl times/256 voxel data set on a general-purpose multiprocessor.
AB - Recently, a new class of scalable, shared-address-space multiprocessors has emerged. Like message-passing machines, these multiprocessors have a distributed interconnection network and physically distributed main memory. However, they provide hardware support for efficient implicit communication through a shared address space, and they automatically exploit temporal locality by caching both local and remote data in a processor's hardware cache. In this article, we show that these architectural characteristics make it much easier to obtain very good speedups on the best known visualization algorithms. Simple and natural parallelizations work very well, the sequential implementations do not have to be fundamentally restructured, and the high degree of temporal locality obviates the need for explicit data distribution and communication management. We demonstrate our claims through parallel versions of three state-of-the-art algorithms: A recent hierarchical radiosity algorithm by Hanrahan et al. (1991), a parallelized ray-casting volume renderer by Levoy (1992), and an optimized ray-tracer by Spach and Pulleyblank (1992). We also discuss a new shear-warp volume rendering algorithm that provides the first demonstration of interactive frame rates for a 256/spl times/256/spl times/256 voxel data set on a general-purpose multiprocessor.
UR - http://www.scopus.com/inward/record.url?scp=0028466452&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0028466452&partnerID=8YFLogxK
U2 - 10.1109/2.299410
DO - 10.1109/2.299410
M3 - Article
AN - SCOPUS:0028466452
SN - 0018-9162
VL - 27
SP - 45
EP - 55
JO - Computer
JF - Computer
ER -