Abstract
The popular social networking and microblogging service Twitter contains language that is very different from what is considered proper. This paper quantifies those linguistic differences between printed English and Tweetspeak using information-theoretic concepts. Letter-based n-gram entropies are calculated and compared to analagous data from two corpora of printed English to demonstrate that 1) Twitter's entropy is overall higher than that of printed English, and 2) individual users' entropies are on average higher the less conventional their language use is. The implications for digitally-mediated communication in general are also discussed.
Original language | English (US) |
---|---|
Title of host publication | 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings |
Pages | 3069-3072 |
Number of pages | 4 |
DOIs | |
State | Published - Oct 23 2012 |
Event | 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto, Japan Duration: Mar 25 2012 → Mar 30 2012 |
Other
Other | 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 |
---|---|
Country/Territory | Japan |
City | Kyoto |
Period | 3/25/12 → 3/30/12 |
All Science Journal Classification (ASJC) codes
- Software
- Signal Processing
- Electrical and Electronic Engineering