TY - GEN
T1 - Characterizing and improving the performance of Intel Threading Building Blocks
AU - Contreras, Gilberto
AU - Martonosi, Margaret Rose
PY - 2008
Y1 - 2008
N2 - The Intel Threading Building Blocks (TBB) runtime library [1] is a popular C++ parallelization environment [2][3] that offers a set of methods and templates for creating parallel applications. Through support of parallel tasks rather than parallel threads, the TBB runtime library offers improved performance scalability by dynamically redistributing parallel tasks across available processors. This not only creates more scalable, portable parallel applications, but also increases programming productivity by allowing programmers to focus their efforts on identifying concurrency rather than worrying about its management. While many applications benefit from dynamic management of parallelism, dynamic management carries parallelization over-head that increases with increasing core counts and decreasing task sizes. Understanding the sources of these overheads and their implications on application performance can help programmers make more efficient use of available parallelism. Clearly understanding the behavior of these overheads is the first step in creating efficient, scalable parallelization environments targeted at future CMP systems. In this paper we study and characterize some of the overheads of the Intel Threading Building Blocks through the use of real-hardware and simulation performance measurements. Our results show that synchronization overheads within TBB can have a significant and detrimental effect on parallelism performance. Random stealing, while simple and effective at low core counts, becomes less effective as application heterogeneity and core counts increase. Overall, our study provides valuable insights that can be used to create more robust, scalable runtime libraries.
AB - The Intel Threading Building Blocks (TBB) runtime library [1] is a popular C++ parallelization environment [2][3] that offers a set of methods and templates for creating parallel applications. Through support of parallel tasks rather than parallel threads, the TBB runtime library offers improved performance scalability by dynamically redistributing parallel tasks across available processors. This not only creates more scalable, portable parallel applications, but also increases programming productivity by allowing programmers to focus their efforts on identifying concurrency rather than worrying about its management. While many applications benefit from dynamic management of parallelism, dynamic management carries parallelization over-head that increases with increasing core counts and decreasing task sizes. Understanding the sources of these overheads and their implications on application performance can help programmers make more efficient use of available parallelism. Clearly understanding the behavior of these overheads is the first step in creating efficient, scalable parallelization environments targeted at future CMP systems. In this paper we study and characterize some of the overheads of the Intel Threading Building Blocks through the use of real-hardware and simulation performance measurements. Our results show that synchronization overheads within TBB can have a significant and detrimental effect on parallelism performance. Random stealing, while simple and effective at low core counts, becomes less effective as application heterogeneity and core counts increase. Overall, our study provides valuable insights that can be used to create more robust, scalable runtime libraries.
UR - http://www.scopus.com/inward/record.url?scp=56449089553&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=56449089553&partnerID=8YFLogxK
U2 - 10.1109/IISWC.2008.4636091
DO - 10.1109/IISWC.2008.4636091
M3 - Conference contribution
AN - SCOPUS:56449089553
SN - 9781424427789
T3 - 2008 IEEE International Symposium on Workload Characterization, IISWC'08
SP - 57
EP - 66
BT - 2008 IEEE International Symposium on Workload Characterization, IISWC'08
T2 - 2008 IEEE International Symposium on Workload Characterization, IISWC'08
Y2 - 14 September 2008 through 16 September 2008
ER -