TY - GEN
T1 - Implementing application-specific cache-coherence protocols in configurable hardware
AU - Brooks, David
AU - Martonosi, Margaret Rose
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 1999.
PY - 1999
Y1 - 1999
N2 - Streamlining communication is key to achieving good performance in shared-memory parallel programs. While full hardware support for cache coherence generally offers the best performance, not all parallel machines provide it. Instead, software layers using Shared Virtual Memory (SVM) can be built to enforce coherence at a higher level. In prior work, researchers have studied application-specific cache coherence protocols implemented either in SVM systems or as handlers run by programmable protocol processors. Since the protocols are specialized to the needs of a single application, they can be particularly helpful in reducing the long latencies and processing overhead that sometimes degrade performance in SVM systems. This paper studies implementing application-specific protocols in hardware, but not via an instruction-based protocol processor as is typical. Instead, we consider configurable implementations based on Field-Programmable Gate Arrays (FPGAs). This approach can be faster than software-based techniques and less expensive than some hardware-based techniques. We study one application, appbt, in detail, including a VHDL-level design of the configurable protocol design. We sketch out approaches for other applications as well. Implementing protocol operations in configurable hardware improves communication performance by roughly 11X for a 32-node system.
AB - Streamlining communication is key to achieving good performance in shared-memory parallel programs. While full hardware support for cache coherence generally offers the best performance, not all parallel machines provide it. Instead, software layers using Shared Virtual Memory (SVM) can be built to enforce coherence at a higher level. In prior work, researchers have studied application-specific cache coherence protocols implemented either in SVM systems or as handlers run by programmable protocol processors. Since the protocols are specialized to the needs of a single application, they can be particularly helpful in reducing the long latencies and processing overhead that sometimes degrade performance in SVM systems. This paper studies implementing application-specific protocols in hardware, but not via an instruction-based protocol processor as is typical. Instead, we consider configurable implementations based on Field-Programmable Gate Arrays (FPGAs). This approach can be faster than software-based techniques and less expensive than some hardware-based techniques. We study one application, appbt, in detail, including a VHDL-level design of the configurable protocol design. We sketch out approaches for other applications as well. Implementing protocol operations in configurable hardware improves communication performance by roughly 11X for a 32-node system.
UR - http://www.scopus.com/inward/record.url?scp=84947938437&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84947938437&partnerID=8YFLogxK
U2 - 10.1007/10704826_13
DO - 10.1007/10704826_13
M3 - Conference contribution
AN - SCOPUS:84947938437
SN - 3540659153
SN - 9783540659150
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 181
EP - 195
BT - Network-Based Parallel Computing
A2 - Sivasubramaniam, Anand
A2 - Lauria, Mario
PB - Springer Verlag
T2 - 3rd International Workshop on Communication, Architecture and Applications for Network-based Parallel Computing , CANPC 1999
Y2 - 9 January 1999 through 9 January 1999
ER -