TY - GEN
T1 - Context-aware automatic text simplification of health materials in low-resource domains
AU - Sakakini, Tarek
AU - Lee, Jong Yoon
AU - Duri, Aditya
AU - Azevedo, Renato F.L.
AU - Gu, Kuangxiao
AU - Bhat, Suma
AU - Morrow, Dan
AU - Hasegawa-Johnson, Mark
AU - Huang, Thomas
AU - Sadauskas, Victor
AU - Graumlich, James
AU - Walayat, Saqib
AU - Willemsen-Dunlap, Ann
AU - Halpin, Donald
N1 - Publisher Copyright:
© 2020 Association for Computational Linguistics
PY - 2020
Y1 - 2020
N2 - Healthcare systems have increased patients' exposure to their own health materials to enhance patients' health levels, but this has been impeded by patients' lack of understanding of their health material. We address potential barriers to their comprehension by developing a context-aware text simplification system for health material. Given the scarcity of annotated parallel corpora in healthcare domains, we design our system to be independent of a parallel corpus, complementing the availability of data-driven neural methods when such corpora are available. Our system compensates for the lack of direct supervision using a biomedical lexical database: Unified Medical Language System (UMLS). Compared to a competitive prior approach that uses a tool for identifying biomedical concepts and a consumer-directed vocabulary list, we empirically show the enhanced accuracy of our system due to improved handling of ambiguous terms. We also show the enhanced accuracy of our system over directly-supervised neural methods in this low-resource setting. Finally, we show the direct impact of our system on laypeople's comprehension of health material via a human subjects' study (n = 160).
AB - Healthcare systems have increased patients' exposure to their own health materials to enhance patients' health levels, but this has been impeded by patients' lack of understanding of their health material. We address potential barriers to their comprehension by developing a context-aware text simplification system for health material. Given the scarcity of annotated parallel corpora in healthcare domains, we design our system to be independent of a parallel corpus, complementing the availability of data-driven neural methods when such corpora are available. Our system compensates for the lack of direct supervision using a biomedical lexical database: Unified Medical Language System (UMLS). Compared to a competitive prior approach that uses a tool for identifying biomedical concepts and a consumer-directed vocabulary list, we empirically show the enhanced accuracy of our system due to improved handling of ambiguous terms. We also show the enhanced accuracy of our system over directly-supervised neural methods in this low-resource setting. Finally, we show the direct impact of our system on laypeople's comprehension of health material via a human subjects' study (n = 160).
UR - http://www.scopus.com/inward/record.url?scp=85118692952&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85118692952&partnerID=8YFLogxK
U2 - 10.18653/v1/2020.louhi-1.13
DO - 10.18653/v1/2020.louhi-1.13
M3 - Conference contribution
AN - SCOPUS:85118692952
T3 - EMNLP 2020 - 11th International Workshop on Health Text Mining and Information Analysis, LOUHI 2020, Proceedings of the Workshop
SP - 115
EP - 126
BT - EMNLP 2020 - 11th International Workshop on Health Text Mining and Information Analysis, LOUHI 2020, Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
T2 - 11th International Workshop on Health Text Mining and Information Analysis, LOUHI 2020, co-located with EMNLP 2020
Y2 - 20 November 2020
ER -