TY - GEN
T1 - Novelty detection based on sentence level patterns
AU - Li, Xiaoyan
AU - Croft, W. Bruce
PY - 2005
Y1 - 2005
N2 - The detection of new information in a document stream is an important component of many potential applications. In this paper, a new novelty detection approach based on the identification of sentence level patterns is proposed. Given a user's information need, some patterns in sentences such as combinations of query words, named entities and phrases, may contain more important and relevant information than single words. Therefore, the proposed novelty detection approach focuses on the identification of previously unseen query-related patterns in sentences. Specifically, a query is preprocessed and represented with patterns that include both query words and required answer types. These patterns are used to retrieve sentences, which are then determined to be novel if it is likely that a new answer is present. An analysis of patterns in sentences was performed with data from the TREC 2002 novelty track and experiments on novelty detection were carried out on data from the TREC 2003 and 2004 novelty tracks. The experimental results show that the proposed pattern-based approach significantly outperforms all three baselines in terms of precision at top ranks.
AB - The detection of new information in a document stream is an important component of many potential applications. In this paper, a new novelty detection approach based on the identification of sentence level patterns is proposed. Given a user's information need, some patterns in sentences such as combinations of query words, named entities and phrases, may contain more important and relevant information than single words. Therefore, the proposed novelty detection approach focuses on the identification of previously unseen query-related patterns in sentences. Specifically, a query is preprocessed and represented with patterns that include both query words and required answer types. These patterns are used to retrieve sentences, which are then determined to be novel if it is likely that a new answer is present. An analysis of patterns in sentences was performed with data from the TREC 2002 novelty track and experiments on novelty detection were carried out on data from the TREC 2003 and 2004 novelty tracks. The experimental results show that the proposed pattern-based approach significantly outperforms all three baselines in terms of precision at top ranks.
KW - Information patterns
KW - Named entities
KW - Novelty detection
UR - http://www.scopus.com/inward/record.url?scp=33745772650&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745772650&partnerID=8YFLogxK
U2 - 10.1145/1099554.1099734
DO - 10.1145/1099554.1099734
M3 - Conference contribution
AN - SCOPUS:33745772650
SN - 1595931406
SN - 9781595931405
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 744
EP - 751
BT - CIKM'05 - Proceedings of the 14th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - CIKM'05 - Proceedings of the 14th ACM International Conference on Information and Knowledge Management
Y2 - 31 October 2005 through 5 November 2005
ER -