Enhancing molecular discovery using descriptor-free rearrangement clustering techniques for sparse data sets

Peter A. DiMaggio, Scott R. McAllister, Christodoulos A. Floudas, Xiao Jiang Feng, Genyuan Li, Joshua D. Rabinowitz, Herschel Albert Rabitz

Research output: Contribution to journalArticlepeer-review

5 Scopus citations


This article presents a descriptor-free method for estimating library compounds with desired properties from synthesizing and assaying minimal library space. The method works by identifying the optimal substituent ordering (i.e., the optimal encoding integer assignment to each functional group on every substituent site of molecular scaffold) based on a global pairwise difference metric intended to capture smoothness of the compound library. The reordering can be accomplished via a (i) mixed-integer linear programming (MILP) model, (ii) genetic algorithm based approach, or (iii) heuristic approach. We present performance comparisons between these techniques as well as an independent analysis of characteristics of the MILP model. Two sparsely sampled data matrices provided by Pfizer are analyzed to validate the proposed approach and we show that the rearrangement of these matrices leads to regular property landscapes which enable reliable property estimation/interpolation over the full library space. An iterative strategy for compound synthesis is also introduced that utilizes the results of the reordered data to direct the synthesis toward desirable compounds. We demonstrate in a simulated experiment using held out subsets of the data that the proposed iterative technique is effective in identifying compounds with desired physical properties.

Original languageEnglish (US)
Pages (from-to)405-418
Number of pages14
JournalAIChE Journal
Issue number2
StatePublished - Feb 2010

All Science Journal Classification (ASJC) codes

  • Biotechnology
  • Environmental Engineering
  • General Chemical Engineering


  • Mathematical modeling
  • Optimization


Dive into the research topics of 'Enhancing molecular discovery using descriptor-free rearrangement clustering techniques for sparse data sets'. Together they form a unique fingerprint.

Cite this