TY - GEN
T1 - SoK
T2 - 34th USENIX Security Symposium, USENIX Security 2025
AU - Xiao, Madelyne
AU - Mayer, Jonathan
N1 - Publisher Copyright:
© 2025 by The USENIX Association All Rights Reserved.
PY - 2025
Y1 - 2025
N2 - We examine the disconnect between scholarship and practice in applying machine learning to trust and safety problems, using misinformation detection as a case study. We survey literature on automated detection of misinformation across a corpus of 248 well-cited papers in the field. We then examine subsets of papers for data and code availability, design missteps, reproducibility, and generalizability. Our paper corpus includes published work in security, natural language processing, and computational social science. Across these disparate disciplines, we identify common errors in dataset and method design. In general, detection tasks are often meaningfully distinct from the challenges that online services actually face. Datasets and model evaluation are often non-representative of real-world contexts, and evaluation frequently is not independent of model training. We demonstrate the limitations of current detection methods in a series of three representative replication studies. Based on the results of these analyses and our literature survey, we conclude that the current state-of-the-art in fully-automated misinformation detection has limited efficacy in detecting human-generated misinformation. We offer recommendations for evaluating applications of machine learning to trust and safety problems and recommend future directions for research.
AB - We examine the disconnect between scholarship and practice in applying machine learning to trust and safety problems, using misinformation detection as a case study. We survey literature on automated detection of misinformation across a corpus of 248 well-cited papers in the field. We then examine subsets of papers for data and code availability, design missteps, reproducibility, and generalizability. Our paper corpus includes published work in security, natural language processing, and computational social science. Across these disparate disciplines, we identify common errors in dataset and method design. In general, detection tasks are often meaningfully distinct from the challenges that online services actually face. Datasets and model evaluation are often non-representative of real-world contexts, and evaluation frequently is not independent of model training. We demonstrate the limitations of current detection methods in a series of three representative replication studies. Based on the results of these analyses and our literature survey, we conclude that the current state-of-the-art in fully-automated misinformation detection has limited efficacy in detecting human-generated misinformation. We offer recommendations for evaluating applications of machine learning to trust and safety problems and recommend future directions for research.
UR - https://www.scopus.com/pages/publications/105021332045
UR - https://www.scopus.com/pages/publications/105021332045#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:105021332045
T3 - Proceedings of the 34th USENIX Security Symposium
SP - 5247
EP - 5266
BT - Proceedings of the 34th USENIX Security Symposium
PB - USENIX Association
Y2 - 13 August 2025 through 15 August 2025
ER -