TY - GEN
T1 - Data Governance in the Age of Large-Scale Data-Driven Language Technology
AU - Jernite, Yacine
AU - Nguyen, Huu
AU - Biderman, Stella
AU - Rogers, Anna
AU - Masoud, Maraim
AU - Danchev, Valentin
AU - Tan, Samson
AU - Luccioni, Alexandra Sasha
AU - Subramani, Nishant
AU - Johnson, Isaac
AU - Dupont, Gerard
AU - Dodge, Jesse
AU - Lo, Kyle
AU - Talat, Zeerak
AU - Radev, Dragomir
AU - Gokaslan, Aaron
AU - Nikpoor, Somaieh
AU - Henderson, Peter
AU - Bommasani, Rishi
AU - Mitchell, Margaret
N1 - Publisher Copyright:
© 2022 Owner/Author.
PY - 2022/6/21
Y1 - 2022/6/21
N2 - The recent emergence and adoption of Machine Learning technology, and specifically of Large Language Models, has drawn attention to the need for systematic and transparent management of language data. This work proposes an approach to global language data governance that attempts to organize data management amongst stakeholders, values, and rights. Our proposal is informed by prior work on distributed governance that accounts for human values and grounded by an international research collaboration that brings together researchers and practitioners from 60 countries. The framework we present is a multi-party international governance structure focused on language data, and incorporating technical and organizational tools needed to support its work.
AB - The recent emergence and adoption of Machine Learning technology, and specifically of Large Language Models, has drawn attention to the need for systematic and transparent management of language data. This work proposes an approach to global language data governance that attempts to organize data management amongst stakeholders, values, and rights. Our proposal is informed by prior work on distributed governance that accounts for human values and grounded by an international research collaboration that brings together researchers and practitioners from 60 countries. The framework we present is a multi-party international governance structure focused on language data, and incorporating technical and organizational tools needed to support its work.
KW - data rights
KW - datasets
KW - language data
KW - technology governance
UR - http://www.scopus.com/inward/record.url?scp=85132971169&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85132971169&partnerID=8YFLogxK
U2 - 10.1145/3531146.3534637
DO - 10.1145/3531146.3534637
M3 - Conference contribution
AN - SCOPUS:85132971169
T3 - ACM International Conference Proceeding Series
SP - 2206
EP - 2222
BT - Proceedings of 2022 5th ACM Conference on Fairness, Accountability, and Transparency, FAccT 2022
PB - Association for Computing Machinery
T2 - 5th ACM Conference on Fairness, Accountability, and Transparency, FAccT 2022
Y2 - 21 June 2022 through 24 June 2022
ER -