We develop a novel deep learning workflow to effectively combine expensive but accurate molecular dynamics (MD) based BFE calculations with fast machine learning models to predict the affinity of compounds. In this approach, candidates are sampled from a large billion-compound synthetically accessible space (such as Enamine REAL) or de novo molecule generator. Sampling is designed to rapidly and parsimoniously train an inexpensive surrogate model from BFE calculations, creating a tool capable of accurately scoring vast libraries of molecules. This will be achieved by using an active learning approach to guide the choice of BFE calculations performed. Development of the workflow will be driven by applications and datasets provided by partners in both the pharmaceutical and healthcare sectors. Common to both is the need to react to the evolution of resistance to selective pharmaceutical compounds which undermines treatment of infections (both bacterial and viral) and human cancers. The dramatic reduction in the cost of genome sequencing has enabled genes to be identified in which variation potentially confers resistance, thereby creating the possibility of screening individual samples of infections or tumors to ascertain which drugs will be most effective. Learning from progress made in the INSPIRE project (which focused on tyrosine kinase resistance), we develop a platform combining molecular modeling and machine learning approaches to support representative workflow which brings together physical modeling and machine learning to identify novel potent and selective compounds.
Researchers should cite this work as follows: