Parallelization libraries: Characterizing and reducing overheads

Abhishek Bhattacharjee, Gilberto Contreras, Margaret Martonosi

Research output: Contribution to journalArticlepeer-review

19 Scopus citations

Abstract

Creating efficient, scalable dynamic parallel runtime systems for chip multiprocessors (CMPs) requires understanding the overheads that manifest at high core counts and small task sizes. In this article, we assess these overheads on Intel's Threading Building Blocks (TBB) and OpenMP. First, we use real hardware and simulations to detail various scheduler and synchronization overheads. We find that these can amount to 47% of TBB benchmark runtime and 80% of OpenMP benchmark runtime. Second, we propose load balancing techniques such as occupancy-based and criticality-guided task stealing, to boost performance.

Original languageEnglish (US)
Article number5
JournalTransactions on Architecture and Code Optimization
Volume8
Issue number1
DOIs
StatePublished - Apr 1 2011

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture

Keywords

  • Intel threading building blocks
  • OpenMP
  • Parallel libraries
  • Performance
  • Task stealing

Fingerprint Dive into the research topics of 'Parallelization libraries: Characterizing and reducing overheads'. Together they form a unique fingerprint.

Cite this