TY - GEN
T1 - Feature noising for log-linear structured prediction
AU - Wang, Sida I.
AU - Wang, Mengqiu
AU - Wager, Stefan
AU - Liang, Percy
AU - Manning, Christopher D.
N1 - Publisher Copyright:
© 2013 Association for Computational Linguistics.
PY - 2013
Y1 - 2013
N2 - NLP models have many and sparse features, and regularization is key for balancing model overfitting versus underfitting. A recently re-popularized form of regularization is to generate fake training data by repeatedly adding noise to real data. We reinterpret this noising as an explicit regularizer, and approximate it with a second-order formula that can be used during training without actually generating fake data. We show how to apply this method to structured prediction using multinomial logistic regression and linear-chain CRFs. We tackle the key challenge of developing a dynamic program to compute the gradient of the regularizer efficiently. The regularizer is a sum over inputs, so we can estimate it more accurately via a semi-supervised or transductive extension. Applied to text classification and NER, our method provides a >1% absolute performance gain over use of standard L2 regularization.
AB - NLP models have many and sparse features, and regularization is key for balancing model overfitting versus underfitting. A recently re-popularized form of regularization is to generate fake training data by repeatedly adding noise to real data. We reinterpret this noising as an explicit regularizer, and approximate it with a second-order formula that can be used during training without actually generating fake data. We show how to apply this method to structured prediction using multinomial logistic regression and linear-chain CRFs. We tackle the key challenge of developing a dynamic program to compute the gradient of the regularizer efficiently. The regularizer is a sum over inputs, so we can estimate it more accurately via a semi-supervised or transductive extension. Applied to text classification and NER, our method provides a >1% absolute performance gain over use of standard L2 regularization.
UR - http://www.scopus.com/inward/record.url?scp=84926377572&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84926377572&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84926377572
T3 - EMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
SP - 1170
EP - 1179
BT - EMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013
Y2 - 18 October 2013 through 21 October 2013
ER -