TY - JOUR
T1 - Attribute-distributed learning
T2 - Models, limits, and algorithms
AU - Zheng, Haipeng
AU - Kulkarni, Sanjeev R.
AU - Poor, H. Vincent
N1 - Funding Information:
Manuscript received April 09, 2010; accepted September 22, 2010. Date of publication October 18, 2010; date of current version December 17, 2010. This research was supported in part by the U.S. Office of Naval Research under Grant N00014-07-1-0555 and N00014-09-1-0342, the U.S. Army Research Office under Grant W911NF-07-1-0185, and the National Science Foundation under Grant CNS-09-05398 and under Science & Technology Center Grant CCF-0939370. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Mathini Sellathurai.
PY - 2011
Y1 - 2011
N2 - This paper introduces a framework for distributed learning (regression) on attribute-distributed data. First, the convergence properties of attribute-distributed regression with an additive model and a fusion center are discussed, and the convergence rate and uniqueness of the limit are shown for some special cases. Then, taking residual refitting (or boosting) as a prototype algorithm, three different schemes, Simple Iterative Projection, a greedy algorithm, and a parallel algorithm (with its derivatives), are proposed and compared. Among these algorithms, the first two are sequential and have low communication overhead, but are susceptible to overtraining. The parallel algorithm has the best performance, but has significant communication requirements. Instead of directly refitting the ensemble residual sequentially, the parallel algorithm redistributes the residual to each agent in proportion to the coefficients of the optimal linear combination of the current individual estimators. Designing residual redistribution schemes also improves the ability to eliminate irrelevant attributes. The performance of the algorithms is compared via extensive simulations. Communication issues are also considered: the amount of data to be exchanged among the three algorithms is compared, and the three methods are generalized to scenarios without a fusion center.
AB - This paper introduces a framework for distributed learning (regression) on attribute-distributed data. First, the convergence properties of attribute-distributed regression with an additive model and a fusion center are discussed, and the convergence rate and uniqueness of the limit are shown for some special cases. Then, taking residual refitting (or boosting) as a prototype algorithm, three different schemes, Simple Iterative Projection, a greedy algorithm, and a parallel algorithm (with its derivatives), are proposed and compared. Among these algorithms, the first two are sequential and have low communication overhead, but are susceptible to overtraining. The parallel algorithm has the best performance, but has significant communication requirements. Instead of directly refitting the ensemble residual sequentially, the parallel algorithm redistributes the residual to each agent in proportion to the coefficients of the optimal linear combination of the current individual estimators. Designing residual redistribution schemes also improves the ability to eliminate irrelevant attributes. The performance of the algorithms is compared via extensive simulations. Communication issues are also considered: the amount of data to be exchanged among the three algorithms is compared, and the three methods are generalized to scenarios without a fusion center.
KW - Distributed information systems
KW - distributed processing
KW - statistical learning
UR - http://www.scopus.com/inward/record.url?scp=84858229684&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84858229684&partnerID=8YFLogxK
U2 - 10.1109/TSP.2010.2088393
DO - 10.1109/TSP.2010.2088393
M3 - Article
AN - SCOPUS:84858229684
SN - 1053-587X
VL - 59
SP - 386
EP - 398
JO - IRE Transactions on Audio
JF - IRE Transactions on Audio
IS - 1
M1 - 5605268
ER -