TY - GEN
T1 - Toxicity in CHATGPT
T2 - 2023 Findings of the Association for Computational Linguistics: EMNLP 2023
AU - Deshpande, Ameet
AU - Murahari, Vishvak
AU - Rajpurohit, Tanmay
AU - Kalyan, Ashwin
AU - Narasimhan, Karthik
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Large language models (LLMs) have shown incredible capabilities and transcended the natural language processing (NLP) community, with adoption throughout many services like healthcare, therapy, education, and customer service. Since users include people with critical information needs like students or patients engaging with chatbots, the safety of these systems is of prime importance. Legislation has recognized its significance and recently drafted a “Blueprint For An AI Bill Of Rights” which calls for domain experts to identify risks and potential impact of AI systems. To this end, we systematically evaluate toxicity in over half a million generations of CHATGPT, a popular dialogue-based LLM. We find that setting the system parameter of CHATGPT by assigning it a persona, say that of the boxer Muhammad Ali, significantly increases the toxicity of generations. Depending on the persona assigned to CHATGPT, its toxicity can increase up to 6×, with outputs engaging in incorrect stereotypes, harmful dialogue, and hurtful opinions. Furthermore, we find concerning patterns where specific entities (e.g., certain races) are targeted more than others (3× more) irrespective of the assigned persona, reflecting discriminatory biases in the model. Our findings show that multiple provisions in the legislative blueprint are being violated, and we hope that the broader AI community rethinks the efficacy of current safety guardrails and develops better techniques that lead to robust, safe, and trustworthy AI.
AB - Large language models (LLMs) have shown incredible capabilities and transcended the natural language processing (NLP) community, with adoption throughout many services like healthcare, therapy, education, and customer service. Since users include people with critical information needs like students or patients engaging with chatbots, the safety of these systems is of prime importance. Legislation has recognized its significance and recently drafted a “Blueprint For An AI Bill Of Rights” which calls for domain experts to identify risks and potential impact of AI systems. To this end, we systematically evaluate toxicity in over half a million generations of CHATGPT, a popular dialogue-based LLM. We find that setting the system parameter of CHATGPT by assigning it a persona, say that of the boxer Muhammad Ali, significantly increases the toxicity of generations. Depending on the persona assigned to CHATGPT, its toxicity can increase up to 6×, with outputs engaging in incorrect stereotypes, harmful dialogue, and hurtful opinions. Furthermore, we find concerning patterns where specific entities (e.g., certain races) are targeted more than others (3× more) irrespective of the assigned persona, reflecting discriminatory biases in the model. Our findings show that multiple provisions in the legislative blueprint are being violated, and we hope that the broader AI community rethinks the efficacy of current safety guardrails and develops better techniques that lead to robust, safe, and trustworthy AI.
UR - http://www.scopus.com/inward/record.url?scp=85183301218&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85183301218&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85183301218
T3 - Findings of the Association for Computational Linguistics: EMNLP 2023
SP - 1236
EP - 1270
BT - Findings of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
Y2 - 6 December 2023 through 10 December 2023
ER -