TY - JOUR
T1 - All-pairs
T2 - An abstraction for data-intensive computing on campus grids
AU - Moretti, Christopher
AU - Bui, Hoang
AU - Hollingsworth, Karen
AU - Rich, Brandon
AU - Flynn, Patrick
AU - Thain, Douglas
N1 - Funding Information:
This work was supported in part by the US National Science Foundation grants CCF-06-21434, CNS-06-43229, and CNS-01-30839. The authors thank David Cieslak, Tim Faltemier, Tanya Peters, and Robert McKeon for testing early versions of this work.
PY - 2010/1
Y1 - 2010/1
N2 - Today, campus grids provide users with easy access to thousands of CPUs. However, it is not always easy for nonexpert users to harness these systems effectively. A large workload composed in what seems to be the obvious way by a naive user may accidentally abuse shared resources and achieve very poor performance. To address this problem, we argue that campus grids should provide end users with high-level abstractions that allow for the easy expression and efficient execution of data-intensive workloads. We present one example of an abstractionAll-Pairsthat fits the needs of several applications in biometrics, bioinformatics, and data mining. We demonstrate that an optimized All-Pairs abstraction is both easier to use than the underlying system, achieve performance orders of magnitude better than the obvious but naive approach, and is both faster and more efficient than a tuned conventional approach. This abstraction has been in production use for one year on a 500 CPU campus grid at the University of Notre Dame and has been used to carry out a groundbreaking analysis of biometric data.
AB - Today, campus grids provide users with easy access to thousands of CPUs. However, it is not always easy for nonexpert users to harness these systems effectively. A large workload composed in what seems to be the obvious way by a naive user may accidentally abuse shared resources and achieve very poor performance. To address this problem, we argue that campus grids should provide end users with high-level abstractions that allow for the easy expression and efficient execution of data-intensive workloads. We present one example of an abstractionAll-Pairsthat fits the needs of several applications in biometrics, bioinformatics, and data mining. We demonstrate that an optimized All-Pairs abstraction is both easier to use than the underlying system, achieve performance orders of magnitude better than the obvious but naive approach, and is both faster and more efficient than a tuned conventional approach. This abstraction has been in production use for one year on a 500 CPU campus grid at the University of Notre Dame and has been used to carry out a groundbreaking analysis of biometric data.
KW - All-pairs
KW - Biometrics
KW - Cloud computing
KW - Data intensive computing
KW - Grid computing
UR - http://www.scopus.com/inward/record.url?scp=72649098345&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=72649098345&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2009.49
DO - 10.1109/TPDS.2009.49
M3 - Article
AN - SCOPUS:72649098345
SN - 1045-9219
VL - 21
SP - 33
EP - 46
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 1
M1 - 4803834
ER -