Abstract
The growing gap between sustained and peak performance for scientific applications is a well-known problem in high-performance computing. The recent development of parallel vector systems offers the potential to reduce this gap for many computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX-6 vector processor, and compares it against the cache-based IBM Power3 and Power4 superscalar architectures, across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines many low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks. Finally, we evaluate the performance of several scientific computing codes. Overall results demonstrate that the SX-6 achieves high performance on a large fraction of our application suite and often significantly outperforms the cache-based architectures. However, certain classes of applications are not easily amenable to vectorization and would require extensive algorithm and implementation reengineering to utilize the SX-6 effectively.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 69-93 |
| Number of pages | 25 |
| Journal | Concurrency and Computation: Practice and Experience |
| Volume | 17 |
| Issue number | 1 |
| DOIs | |
| State | Published - Jan 2005 |
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Software
- Computer Science Applications
- Computer Networks and Communications
- Computational Theory and Mathematics
Keywords
- Microbenchmarks
- Nas parallel Benchmarks
- Scientific applications
- Superscalar performance
- Vectorization