TY - GEN
T1 - MITTS
T2 - 43rd International Symposium on Computer Architecture, ISCA 2016
AU - Zhou, Yanqi
AU - Wentzlaff, David
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/8/24
Y1 - 2016/8/24
N2 - Memory bandwidth severely limits the scalability and performance of multicore and manycore systems. Application performance can be very sensitive to both the delivered memory bandwidth and latency. In multicore systems, a memory channel is usually shared by multiple cores. Having the ability to precisely provision, schedule, and isolate memory bandwidth and latency on a per-core basis is particularly important when different memory guarantees are needed on a per-customer, per-application, or per-core basis. Infrastructure as a Service (IaaS) Cloud systems, and even general purpose multicores optimized for application throughput or fairness all benefit from the ability to control and schedule memory access on a fine-grain basis. In this paper, we propose MITTS (Memory Inter-arrival Time Traffic Shaping), a simple, distributed hardware mechanism which limits memory traffic at the source (Core or LLC). MITTS shapes memory traffic based on memory request inter-arrival time, enabling fine-grain bandwidth allocation. In an IaaS system, MITTS enables Cloud customers to express their memory distribution needs and pay commensurately. For instance, MITTS enables charging customers that have bursty memory traffic more than customers with uniform memory traffic for the same aggregate bandwidth. Beyond IaaS systems, MITTS can also be used to optimize for throughput or fairness in a general purpose multi-program workload. MITTS uses an online genetic algorithm to configure hardware bins, which can adapt for program phases and variable input sets. We have implemented MITTS in Verilog and have taped-out the design in a 25-core 32nm processor and find that MITTS requires less than 0.9% of core area. We evaluate across SPECint, PARSEC, Apache, and bhm Mail Server workloads, and find that MITTS achieves an average 1.18x performance gain compared to the best static bandwidth allocation, a 2.69x average performance/cost advantage in an IaaS setting, and up to 1.17x better throughput and 1.52x better fairness when compared to conventional memory bandwidth provisioning techniques.
AB - Memory bandwidth severely limits the scalability and performance of multicore and manycore systems. Application performance can be very sensitive to both the delivered memory bandwidth and latency. In multicore systems, a memory channel is usually shared by multiple cores. Having the ability to precisely provision, schedule, and isolate memory bandwidth and latency on a per-core basis is particularly important when different memory guarantees are needed on a per-customer, per-application, or per-core basis. Infrastructure as a Service (IaaS) Cloud systems, and even general purpose multicores optimized for application throughput or fairness all benefit from the ability to control and schedule memory access on a fine-grain basis. In this paper, we propose MITTS (Memory Inter-arrival Time Traffic Shaping), a simple, distributed hardware mechanism which limits memory traffic at the source (Core or LLC). MITTS shapes memory traffic based on memory request inter-arrival time, enabling fine-grain bandwidth allocation. In an IaaS system, MITTS enables Cloud customers to express their memory distribution needs and pay commensurately. For instance, MITTS enables charging customers that have bursty memory traffic more than customers with uniform memory traffic for the same aggregate bandwidth. Beyond IaaS systems, MITTS can also be used to optimize for throughput or fairness in a general purpose multi-program workload. MITTS uses an online genetic algorithm to configure hardware bins, which can adapt for program phases and variable input sets. We have implemented MITTS in Verilog and have taped-out the design in a 25-core 32nm processor and find that MITTS requires less than 0.9% of core area. We evaluate across SPECint, PARSEC, Apache, and bhm Mail Server workloads, and find that MITTS achieves an average 1.18x performance gain compared to the best static bandwidth allocation, a 2.69x average performance/cost advantage in an IaaS setting, and up to 1.17x better throughput and 1.52x better fairness when compared to conventional memory bandwidth provisioning techniques.
KW - IaaS Clouds
KW - Memory Architecture
KW - Memory Management
KW - Multicore
UR - http://www.scopus.com/inward/record.url?scp=84988428067&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84988428067&partnerID=8YFLogxK
U2 - 10.1109/ISCA.2016.53
DO - 10.1109/ISCA.2016.53
M3 - Conference contribution
AN - SCOPUS:84988428067
T3 - Proceedings - 2016 43rd International Symposium on Computer Architecture, ISCA 2016
SP - 532
EP - 544
BT - Proceedings - 2016 43rd International Symposium on Computer Architecture, ISCA 2016
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 June 2016 through 22 June 2016
ER -