TY - JOUR
T1 - The Responsible Foundation Model Development Cheatsheet
T2 - A Review of Tools & Resources
AU - Longpre, Shayne
AU - Biderman, Stella
AU - Albalak, Alon
AU - Schoelkopf, Hailey
AU - McDuff, Daniel
AU - Kapoor, Sayash
AU - Klyman, Kevin
AU - Lo, Kyle
AU - Ilharco, Gabriel
AU - San, Nay
AU - Rauh, Maribeth
AU - Skowron, Aviya
AU - Vidgen, Bertie
AU - Weidinger, Laura
AU - Narayanan, Arvind
AU - Sanh, Victor
AU - Adelani, David
AU - Liang, Percy
AU - Bommasani, Rishi
AU - Henderson, Peter
AU - Luccioni, Sasha
AU - Jernite, Yacine
AU - Soldaini, Luca
N1 - Publisher Copyright:
© 2024, Transactions on Machine Learning Research. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Foundation model development attracts a rapidly expanding body of contributors, scien-tists, and applications. To help shape responsible development practices, we introduce the Foundation Model Development Cheatsheet: a growing collection of 250+ tools and resources spanning text, vision, and speech modalities. We draw on a large body of prior work to survey resources (e.g. software, documentation, frameworks, guides, and practical tools) that support informed data selection, processing, and understanding, precise and limitation-aware artifact documentation, efficient model training, advance awareness of the environmental impact from training, careful model evaluation of capabilities, risks, and claims, as well as responsible model release, licensing and deployment practices. The process of curating this list, enabled us to review the AI development ecosystem, revealing what tools are critically missing, misused, or over-used in existing practices. We find that (i) tools for data sourcing, model evaluation, and monitoring are critically under-serving ethical and real-world needs, (ii) evaluations for model safety, capabilities, and environmental impact all lack reproducibility and transparency, (iii) text and particularly English-centric analyses continue to dominate over multilingual and multi-modal analyses, and (iv) evaluation of systems, rather than just models, is needed for capabilities to be assessed in context.
AB - Foundation model development attracts a rapidly expanding body of contributors, scien-tists, and applications. To help shape responsible development practices, we introduce the Foundation Model Development Cheatsheet: a growing collection of 250+ tools and resources spanning text, vision, and speech modalities. We draw on a large body of prior work to survey resources (e.g. software, documentation, frameworks, guides, and practical tools) that support informed data selection, processing, and understanding, precise and limitation-aware artifact documentation, efficient model training, advance awareness of the environmental impact from training, careful model evaluation of capabilities, risks, and claims, as well as responsible model release, licensing and deployment practices. The process of curating this list, enabled us to review the AI development ecosystem, revealing what tools are critically missing, misused, or over-used in existing practices. We find that (i) tools for data sourcing, model evaluation, and monitoring are critically under-serving ethical and real-world needs, (ii) evaluations for model safety, capabilities, and environmental impact all lack reproducibility and transparency, (iii) text and particularly English-centric analyses continue to dominate over multilingual and multi-modal analyses, and (iv) evaluation of systems, rather than just models, is needed for capabilities to be assessed in context.
UR - https://www.scopus.com/pages/publications/105000348934
UR - https://www.scopus.com/inward/citedby.url?scp=105000348934&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:105000348934
SN - 2835-8856
VL - 2024
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -