Abstract
Reference libraries of tandem mass spectra (MS/MS) are widely used for metabolite identification in untargeted metabolomics and to train machine-learning models for metabolite annotation. However, public spectral libraries are scattered across disparate databases and contain spectra that are of low resolution or quality, missing critical metadata, or which have chemically incoherent annotations. Addressing these issues requires extensive preprocessing and considerable expertise in mass spectrometry, which presents a significant barrier to investigators interested in developing their own machine-learning models. Here, we present Spectraverse, a comprehensive and extensively curated library of public MS/MS spectra from small molecules. We assembled reference spectra from both major repositories and previously overlooked resources and then developed a preprocessing pipeline to harmonize metadata, standardize chemical structures, and remove low-quality or redundant spectra. These efforts led us to identify previously undocumented pitfalls in existing public libraries that may have confounded prior comparisons of machine-learning models or conversely have caused valid MS/MS spectra to have been discarded from the training sets of these models. The resulting resource affords the most comprehensive coverage of chemical space of any machine-learning-ready library of MS/MS spectra to date while also expanding the coverage of adducts and ionization modes encountered in metabolomics experiments. We intend to maintain and expand Spectraverse in order to encompass the growing number of publicly available reference MS/MS spectra that can be expected to accumulate in the future.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 3934-3943 |
| Number of pages | 10 |
| Journal | Analytical Chemistry |
| Volume | 98 |
| Issue number | 5 |
| DOIs | |
| State | Published - Feb 10 2026 |
All Science Journal Classification (ASJC) codes
- Analytical Chemistry
Fingerprint
Dive into the research topics of 'Comprehensive Curation and Harmonization of Small-Molecule MS/MS Libraries in Spectraverse'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver