TY - JOUR
T1 - Evaluation of a commercial CC-NUMA architecture - the CONVEX exemplar SPP1200
AU - Thekkath, Radhika
AU - Singh, Amit Pal
AU - Singh, Jaswinder Pal
AU - John, Susan
AU - Hennessy, John
PY - 1997/1/1
Y1 - 1997/1/1
N2 - Studies done with academic CC-NUMA machines and simulators indicate a good potential for application performance. Our goal therefore, is to investigate whether the CONVEX Exemplar, a commercial distributed shared memory machine, lives up to the expected potential of CC-NUMA machines. If not, we would like to understand what architectural or implementation decisions make it less efficient. On evaluating the delivered performance on the Exemplar, we find that, while a moderate-scale Exemplar machine works well for several applications, it does not for some important classes. Further, performance was affected by four fundamental characteristics of the machine, all of which are due to basic implementation and design choices made on the Exemplar These are: the effect of processor clustering together with limited node-to-network bandwidth, the effect of tertiary caches, the limited user control over data placement, the sequential memory consistency model together with a cache-based cache coherence protocol, and lastly, longer remote latencies.
AB - Studies done with academic CC-NUMA machines and simulators indicate a good potential for application performance. Our goal therefore, is to investigate whether the CONVEX Exemplar, a commercial distributed shared memory machine, lives up to the expected potential of CC-NUMA machines. If not, we would like to understand what architectural or implementation decisions make it less efficient. On evaluating the delivered performance on the Exemplar, we find that, while a moderate-scale Exemplar machine works well for several applications, it does not for some important classes. Further, performance was affected by four fundamental characteristics of the machine, all of which are due to basic implementation and design choices made on the Exemplar These are: the effect of processor clustering together with limited node-to-network bandwidth, the effect of tertiary caches, the limited user control over data placement, the sequential memory consistency model together with a cache-based cache coherence protocol, and lastly, longer remote latencies.
UR - http://www.scopus.com/inward/record.url?scp=0030671818&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0030671818&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:0030671818
SN - 1063-7133
SP - 8
EP - 17
JO - Proceedings of the International Parallel Processing Symposium, IPPS
JF - Proceedings of the International Parallel Processing Symposium, IPPS
ER -