Equipping educational applications with domain knowledge

Tarek Sakakini, Hongyu Gong, Jong Yoon Lee, Robert Schloss, Jinjun Xiong, Suma Bhat

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

One of the challenges of building natural language processing (NLP) applications for education is finding a large domain-specific corpus for the subject of interest (e.g., history or science). To address this challenge, we propose a tool, Dexter, that extracts a subjectspecific corpus from a heterogeneous corpus, such as Wikipedia, by relying on a small seed corpus and distributed document representations. We empirically show the impact of the generated corpus on language modeling, estimating word embeddings, and consequently, distractor generation, resulting in a better performance than while using a general domain corpus, a heuristically constructed domainspecific corpus, and a corpus generated by a popular system: BootCaT.

Original languageEnglish (US)
Title of host publicationACL 2019 - Innovative Use of NLP for Building Educational Applications, BEA 2019 - Proceedings of the 14th Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages472-477
Number of pages6
ISBN (Electronic)9781950737345
StatePublished - 2019
Externally publishedYes
Event14th Workshop on Innovative Use of NLP for Building Educational Applications, BEA 2019, collocated with ACL 2019 - Florence, Italy
Duration: Aug 2 2019 → …

Publication series

NameACL 2019 - Innovative Use of NLP for Building Educational Applications, BEA 2019 - Proceedings of the 14th Workshop

Conference

Conference14th Workshop on Innovative Use of NLP for Building Educational Applications, BEA 2019, collocated with ACL 2019
Country/TerritoryItaly
CityFlorence
Period8/2/19 → …

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language
  • Software

Fingerprint

Dive into the research topics of 'Equipping educational applications with domain knowledge'. Together they form a unique fingerprint.

Cite this