TY - JOUR
T1 - Prevalence and Prevention of Large Language Model Use in Crowd Work
AU - Veselovsky, Veniamin
AU - Ribeiro, Manoel Horta
AU - Cozzolino, Philip J.
AU - Gordon, Andrew
AU - Rothschild, David
AU - West, Robert
N1 - Publisher Copyright:
© 2024 ACM Copyright held by the owner/author(s).
PY - 2025/3/1
Y1 - 2025/3/1
N2 - Crowd work platforms, such as Prolific and Amazon Mechanical Turk, play an important part in academia and industry, empowering the creation, annotation, and summarization of data,11 as well as surveys and experiments.21 At the same time, large language models (LLMs), such as ChatGPT, Gemini, and Claude, promise similar capabilities. They are remarkable data annotators10 and can, in some cases, accurately simulate human behavior, enabling in-silico experiments and surveys that yield human-like results.2 Yet, if crowd workers were to start using LLMs, this could threaten the validity of data generated using crowd work platforms. Sometimes, researchers seek to observe unaided human responses (even if LLMs could provide a good proxy), and LLMs still often fail to accurately simulate human behavior.22 Further, LLM-generated data may degrade subsequent models trained on it.23 Here, we investigate the extent to which crowd workers use LLMs in a text-production task and whether targeted mitigation strategies can prevent LLM use.
AB - Crowd work platforms, such as Prolific and Amazon Mechanical Turk, play an important part in academia and industry, empowering the creation, annotation, and summarization of data,11 as well as surveys and experiments.21 At the same time, large language models (LLMs), such as ChatGPT, Gemini, and Claude, promise similar capabilities. They are remarkable data annotators10 and can, in some cases, accurately simulate human behavior, enabling in-silico experiments and surveys that yield human-like results.2 Yet, if crowd workers were to start using LLMs, this could threaten the validity of data generated using crowd work platforms. Sometimes, researchers seek to observe unaided human responses (even if LLMs could provide a good proxy), and LLMs still often fail to accurately simulate human behavior.22 Further, LLM-generated data may degrade subsequent models trained on it.23 Here, we investigate the extent to which crowd workers use LLMs in a text-production task and whether targeted mitigation strategies can prevent LLM use.
UR - https://www.scopus.com/pages/publications/105003186808
UR - https://www.scopus.com/pages/publications/105003186808#tab=citedBy
U2 - 10.1145/3685527
DO - 10.1145/3685527
M3 - Article
AN - SCOPUS:105003186808
SN - 0001-0782
VL - 68
SP - 42
EP - 47
JO - Communications of the ACM
JF - Communications of the ACM
IS - 3
ER -