TY - GEN
T1 - An empirical bayes approach to partially labeled and shuffled data sets
AU - Dytso, Alex
AU - Poor, H. Vincent
N1 - Publisher Copyright:
© 2020 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
PY - 2020/5
Y1 - 2020/5
N2 - This work outlines a method for an application of empirical Bayes in the setting of semi-supervised learning. That is, we consider a scenario in which the training set is partially or entirely unlabeled. In addition to the missing labels, we also consider a scenario where the available training data might be shuffled (i.e., the features and labels are not matched). Specifically, we propose to train model-based empirical Bayes separately on the set of features and the set of labels and combine/mix the two models based on the proportion of unlabeled pairs. The method then can be used to recover the missing labels (i.e., create pseudo-labels) of the data set and, in addition, if the data is shuffled, recover the correct permutation of the data. The technique is evaluated for a multivariate Gaussian model and is shown to consistently outperform a maximum likelihood approach. Moreover, the procedure is shown to be a consistent estimator for a multivariate Gaussian model with an arbitrary (non-degenerate) covariance matrix.
AB - This work outlines a method for an application of empirical Bayes in the setting of semi-supervised learning. That is, we consider a scenario in which the training set is partially or entirely unlabeled. In addition to the missing labels, we also consider a scenario where the available training data might be shuffled (i.e., the features and labels are not matched). Specifically, we propose to train model-based empirical Bayes separately on the set of features and the set of labels and combine/mix the two models based on the proportion of unlabeled pairs. The method then can be used to recover the missing labels (i.e., create pseudo-labels) of the data set and, in addition, if the data is shuffled, recover the correct permutation of the data. The technique is evaluated for a multivariate Gaussian model and is shown to consistently outperform a maximum likelihood approach. Moreover, the procedure is shown to be a consistent estimator for a multivariate Gaussian model with an arbitrary (non-degenerate) covariance matrix.
UR - http://www.scopus.com/inward/record.url?scp=85091282658&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091282658&partnerID=8YFLogxK
U2 - 10.1109/ICASSP40776.2020.9054638
DO - 10.1109/ICASSP40776.2020.9054638
M3 - Conference contribution
AN - SCOPUS:85091282658
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 8886
EP - 8890
BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Y2 - 4 May 2020 through 8 May 2020
ER -