Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm

Michael A. Skinnider, Chris A. Dejong, Brian C. Franczak, Paul D. McNicholas, Nathan A. Magarvey

Research output: Contribution to journalArticlepeer-review

37 Scopus citations


Natural products represent a prominent source of pharmaceutically and industrially important agents. Calculating the chemical similarity of two molecules is a central task in cheminformatics, with applications at multiple stages of the drug discovery pipeline. Quantifying the similarity of natural products is a particularly important problem, as the biological activities of these molecules have been extensively optimized by natural selection. The large and structurally complex scaffolds of natural products distinguish their physical and chemical properties from those of synthetic compounds. However, no analysis of the performance of existing methods for molecular similarity calculation specific to natural products has been reported to date. Here, we present LEMONS, an algorithm for the enumeration of hypothetical modular natural product structures. We leverage this algorithm to conduct a comparative analysis of molecular similarity methods within the unique chemical space occupied by modular natural products using controlled synthetic data, and comprehensively investigate the impact of diverse biosynthetic parameters on similarity search. We additionally investigate a recently described algorithm for natural product retrobiosynthesis and alignment, and find that when rule-based retrobiosynthesis can be applied, this approach outperforms conventional two-dimensional fingerprints, suggesting it may represent a valuable approach for the targeted exploration of natural product chemical space and microbial genome mining. Our open-source algorithm is an extensible method of enumerating hypothetical natural product structures with diverse potential applications in bioinformatics.

Original languageEnglish (US)
Article number46
JournalJournal of Cheminformatics
Issue number1
StatePublished - Aug 16 2017
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Library and Information Sciences
  • Computer Science Applications
  • Physical and Theoretical Chemistry
  • Computer Graphics and Computer-Aided Design


  • Chemical fingerprints
  • Chemical similarity
  • Chemical structure enumeration
  • Natural products


Dive into the research topics of 'Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm'. Together they form a unique fingerprint.

Cite this