Outlier Detection in Large Radiological Datasets Using UMAP

Mohammad Tariqul Islam, Jason W. Fleischer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The success of machine learning algorithms heavily relies on the quality of samples and the accuracy of their corresponding labels. However, building and maintaining large, high-quality datasets is an enormous task. This is especially true for biomedical data and for meta-sets that are compiled from smaller ones, as variations in image quality, labeling, reports, and archiving can lead to errors, inconsistencies, and repeated samples. Here, we show that the uniform manifold approximation and projection (UMAP) algorithm can find these anomalies essentially by forming independent clusters that are distinct from the main (“good”) data but similar to other points with the same error type. As a representative example, we apply UMAP to discover outliers in the publicly available ChestX-ray14, CheXpert, and MURA datasets. While the results are archival and retrospective and focus on radiological images, the graph-based methods work for any data type and will prove equally beneficial for curation at the time of dataset creation.

Original languageEnglish (US)
Title of host publicationTopology- and Graph-Informed Imaging Informatics - 1st International Workshop, TGI3 2024, Held in Conjunction with MICCAI 2024, Proceedings
EditorsChao Chen, Yash Singh, Xiaoling Hu
PublisherSpringer Science and Business Media Deutschland GmbH
Pages111-121
Number of pages11
ISBN (Print)9783031739668
DOIs
StatePublished - 2025
Externally publishedYes
Event1st Workshop on Topology- and Graph- Informed Imaging Informatics, TGI3 2024, held in conjunction with the 27th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2024 - Marrakesh, Morocco
Duration: Oct 10 2024Oct 10 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15239 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference1st Workshop on Topology- and Graph- Informed Imaging Informatics, TGI3 2024, held in conjunction with the 27th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2024
Country/TerritoryMorocco
CityMarrakesh
Period10/10/2410/10/24

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Keywords

  • data curation
  • data visualization
  • neighbor embedding
  • x-ray

Fingerprint

Dive into the research topics of 'Outlier Detection in Large Radiological Datasets Using UMAP'. Together they form a unique fingerprint.

Cite this