Feature noising for log-linear structured prediction

Sida I. Wang, Mengqiu Wang, Stefan Wager, Percy Liang, Christopher D. Manning

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Scopus citations

Abstract

NLP models have many and sparse features, and regularization is key for balancing model overfitting versus underfitting. A recently re-popularized form of regularization is to generate fake training data by repeatedly adding noise to real data. We reinterpret this noising as an explicit regularizer, and approximate it with a second-order formula that can be used during training without actually generating fake data. We show how to apply this method to structured prediction using multinomial logistic regression and linear-chain CRFs. We tackle the key challenge of developing a dynamic program to compute the gradient of the regularizer efficiently. The regularizer is a sum over inputs, so we can estimate it more accurately via a semi-supervised or transductive extension. Applied to text classification and NER, our method provides a >1% absolute performance gain over use of standard L2 regularization.

Original languageEnglish (US)
Title of host publicationEMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages1170-1179
Number of pages10
ISBN (Electronic)9781937284978
StatePublished - 2013
Event2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013 - Seattle, United States
Duration: Oct 18 2013Oct 21 2013

Publication series

NameEMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Other

Other2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013
Country/TerritoryUnited States
CitySeattle
Period10/18/1310/21/13

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Information Systems
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Feature noising for log-linear structured prediction'. Together they form a unique fingerprint.

Cite this