TY - GEN
T1 - The performance impact of flexibility in the stanford flash multiprocessor
AU - Heinrich, Mark
AU - Kuskin, Jeffrey
AU - Ofelt, David
AU - Heinlein, John
AU - Baxter, Joel
AU - Singh, Jaswinder Pal
AU - Simoni, Richard
AU - Gharachorloo, Kourosh
AU - Nakahira, David
AU - Horowitz, Mark
AU - Gupta, Anoop
AU - Rosenblum, Mendel
AU - Hennessy, John
N1 - Funding Information:
We would like to acknowledge the cooperation of Intel Corporation, Supercomputer Systems Division. This work was supported by ARPA contract NOO039-91 -C-01 38. Mark Heinrich and Joel Baxter are supported by National Science Foundation Fellowships. John Heinlein is supported by an Air Force Laboratory Graduate Fellowship. Kourosh Gharachorloo is supported by Digital Equipment Corporation’s Western Research Laboratory. Mendel Rosen-bhrm is supported by an National Science Foundation Young Investigator Award.
PY - 1994/11/1
Y1 - 1994/11/1
N2 - A flexible communication mechanism is a desirable feature in multiprocessors because it allows support for multiple communication protocols, expands performance monitoring capabilities, and leads to a simpler design and debug process. In the Stanford FLASH multiprocessor, flexibility is obtained by requiring all transactions in a node to pass through a programmable node controller, called MAGIC. In this paper, we evaluate the performance costs of flexibility by comparing the performance of FLASH to that of an idealized hardwired machine on representative parallel applications and a multiprogramming workload. To measure the performance of FLASH, we use a detailed simulator of the FLASH and MAGIC designs, together with the code sequences that implement the cache-coherence protocol. We find that for a range of optimized parallel applications the performance differences between the idealized machine and FLASH are small For these programs, either the miss rates are small or the latency of the programmable protocol can be hidden behind the memory access time, For applications that incur a large number of remote misses or exhibit substantial hot-spotting, performance is poor for both machines, though the increased remote access latencies or the occupancy of MAGIC lead to lower performance for the flexible design, In most cases, however, FLASH is only 2%-12% T0slower than the idealized machine.
AB - A flexible communication mechanism is a desirable feature in multiprocessors because it allows support for multiple communication protocols, expands performance monitoring capabilities, and leads to a simpler design and debug process. In the Stanford FLASH multiprocessor, flexibility is obtained by requiring all transactions in a node to pass through a programmable node controller, called MAGIC. In this paper, we evaluate the performance costs of flexibility by comparing the performance of FLASH to that of an idealized hardwired machine on representative parallel applications and a multiprogramming workload. To measure the performance of FLASH, we use a detailed simulator of the FLASH and MAGIC designs, together with the code sequences that implement the cache-coherence protocol. We find that for a range of optimized parallel applications the performance differences between the idealized machine and FLASH are small For these programs, either the miss rates are small or the latency of the programmable protocol can be hidden behind the memory access time, For applications that incur a large number of remote misses or exhibit substantial hot-spotting, performance is poor for both machines, though the increased remote access latencies or the occupancy of MAGIC lead to lower performance for the flexible design, In most cases, however, FLASH is only 2%-12% T0slower than the idealized machine.
UR - http://www.scopus.com/inward/record.url?scp=0003085676&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0003085676&partnerID=8YFLogxK
U2 - 10.1145/195473.195569
DO - 10.1145/195473.195569
M3 - Conference contribution
AN - SCOPUS:0003085676
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 274
EP - 285
BT - Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 1994
PB - Association for Computing Machinery
T2 - 6th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 1994
Y2 - 4 October 1994 through 7 October 1994
ER -