A scalable algorithm for molecular property estimation in high dimensional scaffold-based libraries

Sofia Izmailov, Xiao Jiang Feng, Genyuan Li, Herschel Rabitz

Research output: Contribution to journalArticlepeer-review

4 Scopus citations


An algorithm is presented for the estimation of molecular properties over a library built around a scaffold, which has N sites for functionalization with M i moieties at the ith scaffold site, corresponding to a library of P{cyrillic} i-1 N M i molecules. The algorithm relies on a series of operations involving (i) synthesis and property measurement of a minimal number of T randomly sampled members of the library, (ii) expression of the observed property in terms of a high-dimensional model representation (HDMR) of the moiety → property map, (iii) optimization of the ordered sequence of moieties on each site to regularize the HDMR map and (iv) interpolation using the map to estimate the properties of as yet unsynthesized compounds. The set of operations is performed iteratively aiming to reach convergence of the predictive HDMR map with as few synthesized samples as possible. Through simulation, the number T of required random molecular samples is shown to scale very favorably with T < < P{cyrillic} i-1 N M i for cases up to N = 20 and M i = 20. For example, high estimation quality was attained for simulated libraries with T ~ 5,000 sampled compounds for a library of 20 12 members and T ~ 12,500 sampled compounds for a library of 20 20 members. The algorithm is based on the assumption that a systematic pattern exists in the moiety → property map provided that the moieties are optimally ordered on the scaffold sites within the context of HDMR. The overall procedure is referred to as the substituent reordering HDMR algorithm (SR-HDMR). The technique was also successfully tested with laboratory data for estimating C 13-NMR shifts in a tri-substituted benzene library and for lac operon repression binding.

Original languageEnglish (US)
Pages (from-to)1765-1790
Number of pages26
JournalJournal of Mathematical Chemistry
Issue number7
StatePublished - Aug 2012

All Science Journal Classification (ASJC) codes

  • General Chemistry
  • Applied Mathematics


  • HDMR
  • Property prediction
  • QSAR
  • Substituent reordering


Dive into the research topics of 'A scalable algorithm for molecular property estimation in high dimensional scaffold-based libraries'. Together they form a unique fingerprint.

Cite this