Rethinking Machine Learning Benchmarks in the Context of Professional Codes of Conduct

Peter Henderson, Jieru Hu, Mona Diab, Joelle Pineau

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Benchmarking efforts for machine learning have often mimicked (or even explicitly used) professional licensing exams to assess capabilities in a given area, focusing primarily on accuracy as the metric of choice. However, this approach neglects a variety of essential skills required in professional settings. We propose that professional codes of conduct and rules can guide machine learning researchers to address potential gaps in benchmark construction. These guidelines frequently account for situations professionals may encounter and must handle with care. A model may excel on an exam but still fall short in critical scenarios, deemed unacceptable under professional codes or rules. To motivate this idea, we conduct a case study and comparative examination of machine translation in legal settings. We point out several areas where standard deployments and benchmarks do not assess key requirements under professional rules. We suggest further refinements that would bring the two closer together, including requiring a measurement of uncertainty so that models opt out of uncertain translations. We then share broader insights on constructing and deploying foundation models, particularly in critical domains like law and legal translation.

Original languageEnglish (US)
Title of host publicationCSLAW 2024 - Proceedings of the 3rd Symposium on Computer Science and Law
PublisherAssociation for Computing Machinery, Inc
Pages109-120
Number of pages12
ISBN (Electronic)9798400703331
DOIs
StatePublished - Mar 12 2024
Event3rd Symposium on Computer Science and Law, CSLAW 2024 - Boston, United States
Duration: Mar 12 2024Mar 13 2024

Publication series

NameCSLAW 2024 - Proceedings of the 3rd Symposium on Computer Science and Law

Conference

Conference3rd Symposium on Computer Science and Law, CSLAW 2024
Country/TerritoryUnited States
CityBoston
Period3/12/243/13/24

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Law
  • Communication

Keywords

  • AI & Law
  • AI & Society
  • Benchmarking
  • Evaluation
  • Machine Translation

Fingerprint

Dive into the research topics of 'Rethinking Machine Learning Benchmarks in the Context of Professional Codes of Conduct'. Together they form a unique fingerprint.

Cite this