TY - GEN
T1 - Global multi-threaded instruction scheduling
AU - Ottoni, Guilherme
AU - August, David I.
PY - 2007
Y1 - 2007
N2 - Recently, the microprocessor industry has moved toward chip multiprocessor (CMP) designs as a means of utilizing the increasing transistor counts in the face of physical and micro-architectural limitations. Despite this move, CMPs do not directly improve the performance of single-threaded codes, a characteristic of most applications. In order to support parallelization of general-purpose applications, computer architects have proposed CMPs with light-weight scalar communication mechanisms [21, 23, 29]. Despite such support, most existing compiler multi-threading techniques have generally demonstrated little effectiveness in extracting parallelism from non-scientific applications [14, 15, 22]. The main reason for this is that such techniques are mostly restricted to extracting parallelism within straight-line regions of code. In this paper, we first propose a framework that enables global multi-threaded instruction scheduling in general. We then describe GREMIO, a scheduler built using this framework. GREMIO operates at a global scope, at the procedure level, and uses control dependence analysis to extract non-speculative thread-level parallelism from sequential codes. Using a fully automatic compiler implementation of GREMIO and a validated processor model, this paper demonstrates gains for a dual-core CMP model running a variety of codes. Our experiments demonstrate the advantage of exploiting global scheduling for multi-threaded architectures, and present gains in a detailed comparison with the Decoupled Software Pipelining (DSWP) multi-threading technique [18]. Furthermore, our experiments show that adding GREMIO to a compiler with DSWP improves the average speedup from 16.5% to 32.8% for important benchmark functions when utilizing two cores, indicating the importance of this technique in making compilers extract threads effectively.
AB - Recently, the microprocessor industry has moved toward chip multiprocessor (CMP) designs as a means of utilizing the increasing transistor counts in the face of physical and micro-architectural limitations. Despite this move, CMPs do not directly improve the performance of single-threaded codes, a characteristic of most applications. In order to support parallelization of general-purpose applications, computer architects have proposed CMPs with light-weight scalar communication mechanisms [21, 23, 29]. Despite such support, most existing compiler multi-threading techniques have generally demonstrated little effectiveness in extracting parallelism from non-scientific applications [14, 15, 22]. The main reason for this is that such techniques are mostly restricted to extracting parallelism within straight-line regions of code. In this paper, we first propose a framework that enables global multi-threaded instruction scheduling in general. We then describe GREMIO, a scheduler built using this framework. GREMIO operates at a global scope, at the procedure level, and uses control dependence analysis to extract non-speculative thread-level parallelism from sequential codes. Using a fully automatic compiler implementation of GREMIO and a validated processor model, this paper demonstrates gains for a dual-core CMP model running a variety of codes. Our experiments demonstrate the advantage of exploiting global scheduling for multi-threaded architectures, and present gains in a detailed comparison with the Decoupled Software Pipelining (DSWP) multi-threading technique [18]. Furthermore, our experiments show that adding GREMIO to a compiler with DSWP improves the average speedup from 16.5% to 32.8% for important benchmark functions when utilizing two cores, indicating the importance of this technique in making compilers extract threads effectively.
UR - http://www.scopus.com/inward/record.url?scp=47349095930&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=47349095930&partnerID=8YFLogxK
U2 - 10.1109/MICRO.2007.32
DO - 10.1109/MICRO.2007.32
M3 - Conference contribution
AN - SCOPUS:47349095930
SN - 0769530478
SN - 9780769530475
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 56
EP - 68
BT - Proceedings of the The 40th IEEE/ACM International Symposium on Microarchitecture, MICRO 2007
T2 - 40th IEEE/ACM International Symposium on Microarchitecture, MICRO 2007
Y2 - 1 December 2007 through 5 December 2007
ER -