TY - GEN
T1 - In-Network Snoop Ordering (INSO)
T2 - IEEE 15th International Symposium on High Performance Computer Architecture, HPCA 2009
AU - Agarwal, Niket
AU - Peh, Li Shiuan
AU - Jha, Niraj K.
PY - 2009
Y1 - 2009
N2 - Realizing scalable cache coherence in the many-core era comes with a whole new set of constraints and opportunities. It is widely believed that multi-hop, unordered on-chip networks would be needed in many-core chip multiprocessors (CMPs) to provide scalable on-chip communication. However, providing ordering among coherence transactions on unordered interconnects is a challenge. Traditional approaches for tackling coherence either have to use ordered interconnects (snoopbased protocols) which lead to scalability problems, or rely on an ordering point (directory-based protocols) which adds indirection latency. In this paper, we propose In-Network Snoop Ordering (INSO), in which coherence requests from a snoop-based protocolare inserted into the interconnect fabric and the network orders the requests in a distributed manner, creating a global ordering among requests. Essentially, when coherence requests enter the network, they grab snoop-orders at the injection router before being broadcasted. A snoop-order specifies the global ordering of the particular request with respect to other requests. Before requests reach their destinations, they get ordered along the way, at intermediate routers and destination network interfaces. Our logical ordering scheme can be mapped onto any unordered interconnect. This enables a cache coherence protocol which exploits the low-latency nature of unordered interconnects without adding indirection to coherence transactions. Our full-system evaluations compare INSO against a directory protocol and a broadcast based Token Coherence protocol. INSO outperforms these protocols by up to 30% and 8.5%, respectively, on a wide range of scientific and emerging applications.
AB - Realizing scalable cache coherence in the many-core era comes with a whole new set of constraints and opportunities. It is widely believed that multi-hop, unordered on-chip networks would be needed in many-core chip multiprocessors (CMPs) to provide scalable on-chip communication. However, providing ordering among coherence transactions on unordered interconnects is a challenge. Traditional approaches for tackling coherence either have to use ordered interconnects (snoopbased protocols) which lead to scalability problems, or rely on an ordering point (directory-based protocols) which adds indirection latency. In this paper, we propose In-Network Snoop Ordering (INSO), in which coherence requests from a snoop-based protocolare inserted into the interconnect fabric and the network orders the requests in a distributed manner, creating a global ordering among requests. Essentially, when coherence requests enter the network, they grab snoop-orders at the injection router before being broadcasted. A snoop-order specifies the global ordering of the particular request with respect to other requests. Before requests reach their destinations, they get ordered along the way, at intermediate routers and destination network interfaces. Our logical ordering scheme can be mapped onto any unordered interconnect. This enables a cache coherence protocol which exploits the low-latency nature of unordered interconnects without adding indirection to coherence transactions. Our full-system evaluations compare INSO against a directory protocol and a broadcast based Token Coherence protocol. INSO outperforms these protocols by up to 30% and 8.5%, respectively, on a wide range of scientific and emerging applications.
UR - http://www.scopus.com/inward/record.url?scp=65349166228&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=65349166228&partnerID=8YFLogxK
U2 - 10.1109/HPCA.2009.4798238
DO - 10.1109/HPCA.2009.4798238
M3 - Conference contribution
AN - SCOPUS:65349166228
SN - 9781424429325
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 67
EP - 78
BT - Proceedings - 15th International Symposium on High-Performance Computer Architecture, HPCA - 15 2009
PB - IEEE Computer Society
Y2 - 14 February 2009 through 18 February 2009
ER -