TY - JOUR
T1 - A probabilistic model of meetings that combines words and discourse features
AU - Dowman, Mike
AU - Savova, Virginia
AU - Griffiths, Thomas L.
AU - Körding, Konrad P.
AU - Tenenbaum, Joshua B.
AU - Purver, Matthew
N1 - Funding Information:
Manuscript received June 18, 2007; revised April 30, 2008. This work was supported by the CALO project (DARPA Grant NBCH-D-03-0010) and the work of M. Dowman was supported by a Japan Society for the Promotion of Science postdoctoral fellowship. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Mark Johnson. M. Dowman is with the Department of General System Studies, University of Tokyo, Tokyo 153-8902, Japan (e-mail: mike@sacral.c.u-tokyo.ac.jp). V. Savova and J. B. Tenenbaum are with the Department of Brain and Cognitive, Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail: Savova@mit.edu; jbt@mit.edu). T. L. Griffiths is with the Department of Psychology, University of California at Berkeley, Berkeley, CA 94720 USA (e-mail: tom_griffiths@berkeley.edu). K. P. Körding is with Physical Medicine and Rehabilitation, Northwestern University, Chicago, IL 60611 USA (e-mail: konrad@koerding.com). M. Purver is with Center for the Study of Language and Information, Stanford University, Stanford CA 94305 USA ( e-mail: mpurver@stanford.edu). Digital Object Identifier 10.1109/TASL.2008.925867
PY - 2008/9
Y1 - 2008/9
N2 - In order to determine the points at which meeting discourse changes from one topic to another, probabilistic models were used to approximate the process through which meeting transcripts were produced. Gibbs sampling was used to estimate the values of random variables in the models, including the locations of topic boundaries. This paper shows how discourse features were integrated into the Bayesian model and reports empirical evaluations of the benefit obtained through the inclusion of each feature and of the suitability of alternative models of the placement of topic boundaries. It demonstrates howmultiple cues to segmentation can be combined in a principled way, and empirical tests show a clear improvement over previous work.
AB - In order to determine the points at which meeting discourse changes from one topic to another, probabilistic models were used to approximate the process through which meeting transcripts were produced. Gibbs sampling was used to estimate the values of random variables in the models, including the locations of topic boundaries. This paper shows how discourse features were integrated into the Bayesian model and reports empirical evaluations of the benefit obtained through the inclusion of each feature and of the suitability of alternative models of the placement of topic boundaries. It demonstrates howmultiple cues to segmentation can be combined in a principled way, and empirical tests show a clear improvement over previous work.
KW - Gibbs sampling
KW - Hierarchical bayesian models
KW - Latent dirichlet allocation
KW - Markov chain monte carlo
KW - Topical segmentation
UR - http://www.scopus.com/inward/record.url?scp=67649561065&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67649561065&partnerID=8YFLogxK
U2 - 10.1109/TASL.2008.925867
DO - 10.1109/TASL.2008.925867
M3 - Article
AN - SCOPUS:67649561065
SN - 1558-7916
VL - 16
SP - 1238
EP - 1248
JO - IEEE Transactions on Audio, Speech and Language Processing
JF - IEEE Transactions on Audio, Speech and Language Processing
IS - 7
ER -