TY - GEN
T1 - Speculative decoupled software pipelining
AU - Vachharajani, Neil
AU - Rangan, Ram
AU - Raman, Easwaran
AU - Bridges, Matthew J.
AU - Ottoni, Guilherme
AU - August, David I.
PY - 2007
Y1 - 2007
N2 - In recent years, microprocessor manufacturers have shifted their focus from single-core to multi-core processors. To avoid burdening programmers with the responsibility of parallelizing their applications, some researchers have advocated automatic thread extraction. A recently proposed technique, Decoupled Software Pipelining (DSWP), has demonstrated promise by partitioning loops into long-running, fine-grained threads organized into a pipeline. Using a pipeline organization and execution decoupled by inter-core communication queues, DSWP offers increased execution efficiency that is largely independent of inter-core communication latency. This paper proposes adding speculation to DSWP and evaluates an automatic approach for its implementation. By speculating past infrequent dependences, the benefit of DSWP is increased by making it applicable to more loops, facilitating better balanced threads, and enabling parallelized loops to be run on more cores. Unlike prior speculative threading proposals, speculative DSWP focuses on breaking dependence recurrences. By speculatively breaking these recurrences, instructions that were formerly restricted to a single thread to ensure decoupling are now free to span multiple threads. Using an initial automatic compiler implementation and a validated processor model, this paper demonstrates significant gains using speculation for 4-core chip multiprocessor models running a variety of codes.
AB - In recent years, microprocessor manufacturers have shifted their focus from single-core to multi-core processors. To avoid burdening programmers with the responsibility of parallelizing their applications, some researchers have advocated automatic thread extraction. A recently proposed technique, Decoupled Software Pipelining (DSWP), has demonstrated promise by partitioning loops into long-running, fine-grained threads organized into a pipeline. Using a pipeline organization and execution decoupled by inter-core communication queues, DSWP offers increased execution efficiency that is largely independent of inter-core communication latency. This paper proposes adding speculation to DSWP and evaluates an automatic approach for its implementation. By speculating past infrequent dependences, the benefit of DSWP is increased by making it applicable to more loops, facilitating better balanced threads, and enabling parallelized loops to be run on more cores. Unlike prior speculative threading proposals, speculative DSWP focuses on breaking dependence recurrences. By speculatively breaking these recurrences, instructions that were formerly restricted to a single thread to ensure decoupling are now free to span multiple threads. Using an initial automatic compiler implementation and a validated processor model, this paper demonstrates significant gains using speculation for 4-core chip multiprocessor models running a variety of codes.
UR - http://www.scopus.com/inward/record.url?scp=41349089872&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=41349089872&partnerID=8YFLogxK
U2 - 10.1109/PACT.2007.34
DO - 10.1109/PACT.2007.34
M3 - Conference contribution
AN - SCOPUS:41349089872
SN - 0769529445
SN - 9780769529448
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 49
EP - 59
BT - 16th International Conference on Parallel Architecture and Compilation Techniques, PACT 2007
T2 - 16th International Conference on Parallel Architecture and Compilation Techniques, PACT 2007
Y2 - 15 September 2007 through 19 September 2007
ER -