TY - JOUR
T1 - The ICML 2023 Ranking Experiment
T2 - Examining Author Self-Assessment in ML/AI Peer Review
AU - Su, Buxin
AU - Zhang, Jiayao
AU - Collina, Natalie
AU - Yan, Yuling
AU - Li, Didong
AU - Cho, Kyunghyun
AU - Fan, Jianqing
AU - Roth, Aaron
AU - Su, Weijie
N1 - Publisher Copyright:
© 2025 American Statistical Association.
PY - 2025
Y1 - 2025
N2 - We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML), asking authors with multiple submissions to rank their papers based on perceived quality. In total, we received 1342 rankings, each from a different author, covering 2592 submissions. In this article, we present an empirical analysis of how author-provided rankings could be leveraged to improve peer review processes at machine learning conferences. We focus on the Isotonic Mechanism, which calibrates raw review scores using the author-provided rankings. Our analysis shows that these ranking-calibrated scores outperform the raw review scores in estimating the ground truth “expected review scores” in terms of both squared and absolute error metrics. Furthermore, we propose several cautious, low-risk applications of the Isotonic Mechanism and author-provided rankings in peer review, including supporting senior area chairs in overseeing area chairs’ recommendations, assisting in the selection of paper awards, and guiding the recruitment of emergency reviewers. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
AB - We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML), asking authors with multiple submissions to rank their papers based on perceived quality. In total, we received 1342 rankings, each from a different author, covering 2592 submissions. In this article, we present an empirical analysis of how author-provided rankings could be leveraged to improve peer review processes at machine learning conferences. We focus on the Isotonic Mechanism, which calibrates raw review scores using the author-provided rankings. Our analysis shows that these ranking-calibrated scores outperform the raw review scores in estimating the ground truth “expected review scores” in terms of both squared and absolute error metrics. Furthermore, we propose several cautious, low-risk applications of the Isotonic Mechanism and author-provided rankings in peer review, including supporting senior area chairs in overseeing area chairs’ recommendations, assisting in the selection of paper awards, and guiding the recruitment of emergency reviewers. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
KW - Isotonic mechanism
KW - ML/AI conference
KW - Mechanism design
KW - Peer review
UR - https://www.scopus.com/pages/publications/105010840880
UR - https://www.scopus.com/pages/publications/105010840880#tab=citedBy
U2 - 10.1080/01621459.2025.2510006
DO - 10.1080/01621459.2025.2510006
M3 - Article
AN - SCOPUS:105010840880
SN - 0162-1459
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
ER -