TY - JOUR
T1 - Latency, occupancy, and bandwidth in DSM multiprocessors
T2 - A performance evaluation
AU - Chaudhuri, Mainak
AU - Heinrich, Mark
AU - Holt, Chris
AU - Singh, Jaswinder Pal
AU - Rothberg, Edward
AU - Hennessy, John
N1 - Funding Information:
He received the Masters and PhD degrees in computer science from the State University of New York at Stony Brook in 1975 and 1977, respectively. Since September 1977, he has been a faculty member at Stanford University, where he is currently a professor of Electrical Engineering and Computer Science. Prior to becoming President, Professor Hennessy served as the University Provost, the Dean of the School of Engineering, and was chairman of the Computer Science Department. He is the recipient of the 1983 John J. Gallen Memorial Award, awarded by Villanova University to the most outstanding young engineering alumnus. He is the recipient of a 1984 US National Science Foundation Presidential Young Investigator Award and, in 1987, was named the Willard and Inez K. Bell Professor of Electrical Engineering and Computer Science. In 1991, he received the Distinguished Alumnus Award from the State University of New York at Stony Brook. He is a fellow of the IEEE, a member of the National Academy of Sciences, a member of the National Academy of Engineering, a fellow of the American Academy of Arts and Sciences, and a fellow of the ACM. He is the recipient of the 1994 IEEE Piore Award, the 2000 ASEE R. Lamme Medal, the 2000 John Von Neumann Medal, the 2001 Eckert Mauchly Award, and the 2001 Seymour Cray Award. In 2001, he received an honorary doctorate from Villanova, and an honorary degree of science from SUNY Stony Brook.
PY - 2003/7
Y1 - 2003/7
N2 - While the desire to use commodity parts in the communication architecture of a DSM multiprocessor offers advantages in cost and design time, the impact on application performance is unclear. We study this performance impact through detailed simulation, analytical modeling, and experiments on a flexible DSM prototype, using a range of parallel applications. We adapt the logP model to characterize the communication architectures of DSM machines. The l (network latency) and o (controller occupancy) parameters are the keys to performance in these machines, with the g (node-to-network bandwidth) parameter becoming important only for the fastest controllers. We show that, of all the logP parameters, controller occupancy has the greatest impact on application performance. Of the two contributions of occupancy to performance degradation the latency it adds and the contention it induces-it is the contention component that governs performance regardless of network latency, showing a quadratic dependence on o. As expected, techniques to reduce the impact of latency make controller occupancy a greater bottleneck. Surprisingly, the performance impact of occupancy is substantial, even for highly-tuned applications and even in the absence of latency hiding techniques. Scaling the problem size is often used as a technique to overcome limitations in communication latency and bandwidth. Through experiments on a DSM prototype, we show that there are important classes of applications for which the performance lost by using higher occupancy controllers cannot be regained easily, if at all, by scaling the problem size.
AB - While the desire to use commodity parts in the communication architecture of a DSM multiprocessor offers advantages in cost and design time, the impact on application performance is unclear. We study this performance impact through detailed simulation, analytical modeling, and experiments on a flexible DSM prototype, using a range of parallel applications. We adapt the logP model to characterize the communication architectures of DSM machines. The l (network latency) and o (controller occupancy) parameters are the keys to performance in these machines, with the g (node-to-network bandwidth) parameter becoming important only for the fastest controllers. We show that, of all the logP parameters, controller occupancy has the greatest impact on application performance. Of the two contributions of occupancy to performance degradation the latency it adds and the contention it induces-it is the contention component that governs performance regardless of network latency, showing a quadratic dependence on o. As expected, techniques to reduce the impact of latency make controller occupancy a greater bottleneck. Surprisingly, the performance impact of occupancy is substantial, even for highly-tuned applications and even in the absence of latency hiding techniques. Scaling the problem size is often used as a technique to overcome limitations in communication latency and bandwidth. Through experiments on a DSM prototype, we show that there are important classes of applications for which the performance lost by using higher occupancy controllers cannot be regained easily, if at all, by scaling the problem size.
KW - Bandwidth
KW - Communication controller
KW - Distributed shared memory multiprocessors
KW - Flexible node controller
KW - Latency
KW - Occupancy
KW - Queuing model
UR - http://www.scopus.com/inward/record.url?scp=0042850575&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0042850575&partnerID=8YFLogxK
U2 - 10.1109/TC.2003.1214336
DO - 10.1109/TC.2003.1214336
M3 - Article
AN - SCOPUS:0042850575
SN - 0018-9340
VL - 52
SP - 862
EP - 880
JO - IEEE Transactions on Computers
JF - IEEE Transactions on Computers
IS - 7
ER -