TLB Improvements for chip multiprocessors: Inter-core cooperative prefetchers and shared last-level tlbs

Daniel Lustig, Abhishek Bhattacharjee, Margaret Martonosi

Research output: Contribution to journalArticlepeer-review

51 Scopus citations


Translation Lookaside Buffers (TLBs) are critical to overall system performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as Chip MultiProcessors (CMPs) become ubiquitous, TLB design and performance must be reevaluated. Our article begins by performing a thorough TLB performance evaluation of sequential and parallel benchmarks running on a real-world, modern CMP system using hardware performance counters. This analysis demonstrates the need for further improvement of TLB hit rates for both classes of application, and it also points out that the data TLB has a significantly higher miss rate than the instruction TLB in both cases. In response to the characterization data, we propose and evaluate both Inter-Core Cooperative (ICC) TLB prefetchers and Shared Last-Level (SLL) TLBs as alternatives to the commercial norm of private, per-core L2 TLBs. ICC prefetchers eliminate 19% to 90% of Data TLB (D-TLB) misses across parallel workloads while requiring only modest changes in hardware. SLL TLBs eliminate 7% to 79% of D-TLB misses for parallel workloads and 35% to 95% of D-TLB misses for multiprogrammed sequential workloads. This corresponds to 27% and 21% increases in hit rates as compared to private, per-core L2 TLBs, respectively, and is achieved this using even more modest hardware requirements. Because of their benefits for parallel applications, their applicability to sequential workloads, and their readily implementable hardware, SLL TLBs and ICC TLB prefetchers hold great promise for CMPs.

Original languageEnglish (US)
Article number2
JournalTransactions on Architecture and Code Optimization
Issue number1
StatePublished - Apr 2013

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture


  • Performance evaluation
  • Shared last-level TLB
  • Simulation
  • TLB prefetching
  • Translation lookaside buffer


Dive into the research topics of 'TLB Improvements for chip multiprocessors: Inter-core cooperative prefetchers and shared last-level tlbs'. Together they form a unique fingerprint.

Cite this