TY - GEN
T1 - Improving novelty detection for general topics using sentence level information patterns
AU - Li, Xiaoyan
AU - Croft, W. Bruce
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2006
Y1 - 2006
N2 - The detection of new information in a document stream is an important component of many potential applications. In this work, a new novelty detection approach based on the identification of sentence level information patterns is proposed. First, the information-pattern concept for novelty detection is presented with the emphasis on new information patterns for general topics (queries) that cannot be simply turned into specific questions whose answers are specific named entities (NEs). Then we elaborate a thorough analysis of sentence level information patterns on data from the TREC novelty tracks, including sentence lengths, named entities, sentence level opinion patterns. This analysis provides guidelines in applying those patterns in novelty detection particularly for the general topics. Finally, a unified pattern-based approach is presented to novelty detection for both general and specific topics. The new method for dealing with general topics will be the focus. Experimental results show that the proposed approach significantly improves the performance of novelty detection for general topics as well as the overall performance for all topics from the 2002-2004 TREC novelty tracks.
AB - The detection of new information in a document stream is an important component of many potential applications. In this work, a new novelty detection approach based on the identification of sentence level information patterns is proposed. First, the information-pattern concept for novelty detection is presented with the emphasis on new information patterns for general topics (queries) that cannot be simply turned into specific questions whose answers are specific named entities (NEs). Then we elaborate a thorough analysis of sentence level information patterns on data from the TREC novelty tracks, including sentence lengths, named entities, sentence level opinion patterns. This analysis provides guidelines in applying those patterns in novelty detection particularly for the general topics. Finally, a unified pattern-based approach is presented to novelty detection for both general and specific topics. The new method for dealing with general topics will be the focus. Experimental results show that the proposed approach significantly improves the performance of novelty detection for general topics as well as the overall performance for all topics from the 2002-2004 TREC novelty tracks.
KW - Information patterns
KW - Named entities
KW - Novelty detection
UR - http://www.scopus.com/inward/record.url?scp=34547643555&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547643555&partnerID=8YFLogxK
U2 - 10.1145/1183614.1183652
DO - 10.1145/1183614.1183652
M3 - Conference contribution
AN - SCOPUS:34547643555
SN - 1595934332
SN - 9781595934338
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 238
EP - 247
BT - Proceedings of the 15th ACM Conference on Information and Knowledge Management, CIKM 2006
T2 - 15th ACM Conference on Information and Knowledge Management, CIKM 2006
Y2 - 6 November 2006 through 11 November 2006
ER -