### Abstract

An algorithm is presented for the estimation of molecular properties over a library built around a scaffold, which has N sites for functionalization with M _{i} moieties at the ith scaffold site, corresponding to a library of P{cyrillic} _{i-1} ^{N} M _{i} molecules. The algorithm relies on a series of operations involving (i) synthesis and property measurement of a minimal number of T randomly sampled members of the library, (ii) expression of the observed property in terms of a high-dimensional model representation (HDMR) of the moiety → property map, (iii) optimization of the ordered sequence of moieties on each site to regularize the HDMR map and (iv) interpolation using the map to estimate the properties of as yet unsynthesized compounds. The set of operations is performed iteratively aiming to reach convergence of the predictive HDMR map with as few synthesized samples as possible. Through simulation, the number T of required random molecular samples is shown to scale very favorably with T < < P{cyrillic} _{i-1} ^{N} M _{i} for cases up to N = 20 and M _{i} = 20. For example, high estimation quality was attained for simulated libraries with T ~ 5,000 sampled compounds for a library of 20 ^{12} members and T ~ 12,500 sampled compounds for a library of 20 ^{20} members. The algorithm is based on the assumption that a systematic pattern exists in the moiety → property map provided that the moieties are optimally ordered on the scaffold sites within the context of HDMR. The overall procedure is referred to as the substituent reordering HDMR algorithm (SR-HDMR). The technique was also successfully tested with laboratory data for estimating C ^{13}-NMR shifts in a tri-substituted benzene library and for lac operon repression binding.

Original language | English (US) |
---|---|

Pages (from-to) | 1765-1790 |

Number of pages | 26 |

Journal | Journal of Mathematical Chemistry |

Volume | 50 |

Issue number | 7 |

DOIs | |

State | Published - Aug 2012 |

### All Science Journal Classification (ASJC) codes

- Chemistry(all)
- Applied Mathematics

### Keywords

- HDMR
- Property prediction
- QSAR
- Substituent reordering

## Fingerprint Dive into the research topics of 'A scalable algorithm for molecular property estimation in high dimensional scaffold-based libraries'. Together they form a unique fingerprint.

## Cite this

*Journal of Mathematical Chemistry*,

*50*(7), 1765-1790. https://doi.org/10.1007/s10910-012-0005-y