Feature noising for log-linear structured prediction

Sida I. Wang, Mengqiu Wang, Stefan Wager, Percy Liang, Christopher D. Manning

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Scopus citations

Abstract

NLP models have many and sparse features, and regularization is key for balancing model overfitting versus underfitting. A recently re-popularized form of regularization is to generate fake training data by repeatedly adding noise to real data. We reinterpret this noising as an explicit regularizer, and approximate it with a second-order formula that can be used during training without actually generating fake data. We show how to apply this method to structured prediction using multinomial logistic regression and linear-chain CRFs. We tackle the key challenge of developing a dynamic program to compute the gradient of the regularizer efficiently. The regularizer is a sum over inputs, so we can estimate it more accurately via a semi-supervised or transductive extension. Applied to text classification and NER, our method provides a >1% absolute performance gain over use of standard L2 regularization.

Original languageEnglish (US)
Title of host publicationEMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages1170-1179
Number of pages10
ISBN (Electronic)9781937284978
StatePublished - Jan 1 2013
Event2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013 - Seattle, United States
Duration: Oct 18 2013Oct 21 2013

Publication series

NameEMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Other

Other2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013
CountryUnited States
CitySeattle
Period10/18/1310/21/13

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Information Systems
  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of 'Feature noising for log-linear structured prediction'. Together they form a unique fingerprint.

  • Cite this

    Wang, S. I., Wang, M., Wager, S., Liang, P., & Manning, C. D. (2013). Feature noising for log-linear structured prediction. In EMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 1170-1179). (EMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference). Association for Computational Linguistics (ACL).