TY - GEN
T1 - Full correlation matrix analysis of fMRI data on Intel Xeon Phi coprocessors
AU - Wang, Yida
AU - Anderson, Michael J.
AU - Cohen, Jonathan D.
AU - Heinecke, Alexander
AU - Li, Kai
AU - Satish, Nadathur
AU - Sundaram, Narayanan
AU - Turk-Browne, Nicholas B.
AU - Willke, Theodore L.
N1 - Funding Information:
We would like to thank our shepherd Mihai Anitescu and the anonymous reviewers of the SC program committee for their valuable comments which improved the paper a lot. We are also grateful to Yungang Bao, Guangming Tan and Lin-peng Tang for helpful discussion. This work was supported in part by Intel Corporation, the J. Insley Blair Pyne fund at Princeton University, the John Templeton Foundation, the National Science Foundation (MRI BCS1229597), and the National Institutes of Health (R01 EY021755).
Publisher Copyright:
© 2015 ACM.
PY - 2015/11/15
Y1 - 2015/11/15
N2 - Full correlation matrix analysis (FCMA) is an unbiased approach for exhaustively studying interactions among brain regions in functional magnetic resonance imaging (fMRI) data from human participants. In order to answer neuroscientific questions efficiently, we are developing a closed-loop analysis system with FCMA on a cluster of nodes with Intel Xeon Phi coprocessors. Here we propose several ideas for data-driven algorithmic modification to improve the performance on the coprocessor. Our experiments with real datasets show that the optimized single-node code runs 5x-16x faster than the baseline implementation using the well-known Intel MKL and LibSVM libraries, and that the cluster implementation achieves near linear speedup on 5760 cores.
AB - Full correlation matrix analysis (FCMA) is an unbiased approach for exhaustively studying interactions among brain regions in functional magnetic resonance imaging (fMRI) data from human participants. In order to answer neuroscientific questions efficiently, we are developing a closed-loop analysis system with FCMA on a cluster of nodes with Intel Xeon Phi coprocessors. Here we propose several ideas for data-driven algorithmic modification to improve the performance on the coprocessor. Our experiments with real datasets show that the optimized single-node code runs 5x-16x faster than the baseline implementation using the well-known Intel MKL and LibSVM libraries, and that the cluster implementation achieves near linear speedup on 5760 cores.
KW - Intel® Xeon Phi™ coprocessor
KW - fMRI data
UR - http://www.scopus.com/inward/record.url?scp=84966495075&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84966495075&partnerID=8YFLogxK
U2 - 10.1145/2807591.2807631
DO - 10.1145/2807591.2807631
M3 - Conference contribution
AN - SCOPUS:84966495075
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2015
PB - IEEE Computer Society
T2 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015
Y2 - 15 November 2015 through 20 November 2015
ER -