CTRL: Clustering Training Losses for Label Error Detection

Chang Yue, Niraj K. Jha

Research output: Contribution to journalArticlepeer-review

Abstract

In supervised machine learning, use of correct labels is extremely important to ensure high accuracy. Unfortunately, most datasets contain corrupted labels. Machine learning models trained on such datasets do not generalize well. Thus, detecting the label errors can significantly increase their efficacy. We propose a novel framework, called CTRL11CTRL is open-source: https://github.com/chang-yue/ctrl. (Clustering TRaining Losses for label error detection), to detect label errors in multiclass datasets. It detects label errors in two steps based on the observation that models learn clean and noisy labels in different ways. First, we train a neural network (NN) using the noisy training dataset and obtain the loss curve for each sample. Then, we apply clustering algorithms to the training losses to group samples into two categories: cleanly labeled and noisily labeled. After label error detection, we remove samples with noisy labels and retrain the model. Our experimental results demonstrate state-of-the-art error detection accuracy on both image and tabular datasets under labeling noise. We also use a theoretical analysis to provide insights into why CTRL performs so well.

Original languageEnglish (US)
Pages (from-to)4121-4135
Number of pages15
JournalIEEE Transactions on Artificial Intelligence
Volume5
Issue number8
DOIs
StatePublished - 2024

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Artificial Intelligence

Keywords

  • Label error
  • memorization effects
  • neural networks (NNs)
  • noisy labels
  • robust learning

Fingerprint

Dive into the research topics of 'CTRL: Clustering Training Losses for Label Error Detection'. Together they form a unique fingerprint.

Cite this