Abstract
A number of researchers have presented architectural techniques for scaling a cache coherent shared address space to much larger processor counts. The present paper examines the extent to which applications can achieve reasonable performance on such large-scale, cache-coherent, distributed shared address space machines, by determining the problems sizes needed to achieve a reasonable level of efficiency. It also looks at how much programming effort and optimization is needed to achieve high efficiency beyond that needed at small processor counts. For each application, the main architectural bottlenecks are discussed that prevent smaller problem sizes or less optimized programs from achieving good efficiency.
Original language | English (US) |
---|---|
Pages (from-to) | 134-145 |
Number of pages | 12 |
Journal | Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA |
DOIs | |
State | Published - 1996 |
Externally published | Yes |
Event | Proceedings of the 1996 23rd Annual International Symposium on Computer Architecture - Philadelphia, PA, USA Duration: May 22 1996 → May 24 1996 |
All Science Journal Classification (ASJC) codes
- Hardware and Architecture