TY - GEN
T1 - Evaluation of cache-based superscalar and cacheless vector architectures for scientific computations
AU - Oliker, Leonid
AU - Canning, Andrew
AU - Carter, Jonathan
AU - Shalf, John
AU - Skinner, David
AU - Ethier, Stéphane
AU - Biswas, Rupak
AU - Djomehri, Jahed
AU - Van Der Wijngaart, Rob
PY - 2003
Y1 - 2003
N2 - The growing gap between sustained and peak performance for scientific applications is a well-known problem in high end computing. The recent development of parallel vector systems offers the potential to bridge this gap for many computational science codes and deliver a substantial increase in comput-ing capabilities. This paper examines the intranode performance of the NEC SX-6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of scientific computing areas. First, we present the performance of a microbenchmark suite that examines low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks. Finally, we evaluate the performance of several scientific computing codes. Results demonstrate that the SX-6 achieves high performance on a large fraction of our applications and often significantly outperforms the cache-based architectures. However, certain applications are not easily amenable to vectorization and would require extensive algorithm and implementation reengineering to utilize the SX-6 effectively.
AB - The growing gap between sustained and peak performance for scientific applications is a well-known problem in high end computing. The recent development of parallel vector systems offers the potential to bridge this gap for many computational science codes and deliver a substantial increase in comput-ing capabilities. This paper examines the intranode performance of the NEC SX-6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of scientific computing areas. First, we present the performance of a microbenchmark suite that examines low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks. Finally, we evaluate the performance of several scientific computing codes. Results demonstrate that the SX-6 achieves high performance on a large fraction of our applications and often significantly outperforms the cache-based architectures. However, certain applications are not easily amenable to vectorization and would require extensive algorithm and implementation reengineering to utilize the SX-6 effectively.
UR - https://www.scopus.com/pages/publications/84877067309
UR - https://www.scopus.com/inward/citedby.url?scp=84877067309&partnerID=8YFLogxK
U2 - 10.1145/1048935.1050213
DO - 10.1145/1048935.1050213
M3 - Conference contribution
AN - SCOPUS:84877067309
SN - 1581136951
SN - 9781581136951
T3 - Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003
BT - Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003
T2 - 2003 ACM/IEEE Conference on Supercomputing, SC 2003
Y2 - 15 November 2003 through 21 November 2003
ER -