TY - GEN
T1 - A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS
AU - Kumar, Amit
AU - Kundu, Partha
AU - Singh, Arvind P.
AU - Peh, Li Shiuan
AU - Jha, Niraj K.
PY - 2007
Y1 - 2007
N2 - As chip multiprocessors (CMPs) become the only viable way to scale up and utilize the abundant transistors made available in current microprocessors, the design of on-chip networks is becoming critically important. These networks face unique design constraints and are required to provide extremely fast and high bandwidth communication, yet meet tight power and area budgets. In this paper, we present a detailed design of our on-chip network router targeted at a 36-core shared-memory CMP system in 65nm technology. Our design targets an aggressive clock frequency of 3.6GHz, thus posing tough design challenges that led to several unique circuit and microarchitectural innovations and design choices, including a novel high throughput and low latency switch allocation mechanism, a non-speculative single-cycle router pipeline which uses advanced bundles to remove control setup overhead, a low-complexity virtual channel allocator and a dynamically-managed shared buffer design which uses prefetching to minimize critical path delay. Our router takes up 1.19 mm2 area and expends 551 mW power at 10% activity, delivering a single-cycle no-load latency at 3.6GHz clock frequency while achieving a peak switching data rate in excess of 4.6Tbits/sper router node.
AB - As chip multiprocessors (CMPs) become the only viable way to scale up and utilize the abundant transistors made available in current microprocessors, the design of on-chip networks is becoming critically important. These networks face unique design constraints and are required to provide extremely fast and high bandwidth communication, yet meet tight power and area budgets. In this paper, we present a detailed design of our on-chip network router targeted at a 36-core shared-memory CMP system in 65nm technology. Our design targets an aggressive clock frequency of 3.6GHz, thus posing tough design challenges that led to several unique circuit and microarchitectural innovations and design choices, including a novel high throughput and low latency switch allocation mechanism, a non-speculative single-cycle router pipeline which uses advanced bundles to remove control setup overhead, a low-complexity virtual channel allocator and a dynamically-managed shared buffer design which uses prefetching to minimize critical path delay. Our router takes up 1.19 mm2 area and expends 551 mW power at 10% activity, delivering a single-cycle no-load latency at 3.6GHz clock frequency while achieving a peak switching data rate in excess of 4.6Tbits/sper router node.
UR - http://www.scopus.com/inward/record.url?scp=52949114554&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=52949114554&partnerID=8YFLogxK
U2 - 10.1109/ICCD.2007.4601881
DO - 10.1109/ICCD.2007.4601881
M3 - Conference contribution
AN - SCOPUS:52949114554
SN - 1424412587
SN - 9781424412587
T3 - 2007 IEEE International Conference on Computer Design, ICCD 2007
SP - 63
EP - 70
BT - 2007 IEEE International Conference on Computer Design, ICCD 2007
T2 - 2007 IEEE International Conference on Computer Design, ICCD 2007
Y2 - 7 October 2007 through 10 October 2007
ER -