TY - GEN
T1 - Baselines and bigrams
T2 - 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012
AU - Wang, Sida
AU - Manning, Christopher D.
PY - 2012
Y1 - 2012
N2 - Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classification, but their performance varies greatly depending on the model variant, features used and task/ dataset. We show that: (i) the inclusion of word bigram features gives consistent gains on sentiment analysis tasks; (ii) for short snippet sentiment tasks, NB actually does better than SVMs (while for longer documents the opposite result holds); (iii) a simple but novel SVM variant using NB log-count ratios as feature values consistently performs well across tasks and datasets. Based on these observations, we identify simple NB and SVM variants which outperform most published results on sentiment analysis datasets, sometimes providing a new state-of-the-art performance level.
AB - Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classification, but their performance varies greatly depending on the model variant, features used and task/ dataset. We show that: (i) the inclusion of word bigram features gives consistent gains on sentiment analysis tasks; (ii) for short snippet sentiment tasks, NB actually does better than SVMs (while for longer documents the opposite result holds); (iii) a simple but novel SVM variant using NB log-count ratios as feature values consistently performs well across tasks and datasets. Based on these observations, we identify simple NB and SVM variants which outperform most published results on sentiment analysis datasets, sometimes providing a new state-of-the-art performance level.
UR - http://www.scopus.com/inward/record.url?scp=84875872773&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84875872773&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84875872773
SN - 9781937284251
T3 - 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference
SP - 90
EP - 94
BT - 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference
Y2 - 8 July 2012 through 14 July 2012
ER -