An Adaptive Computational Pipeline to Accelerate Drug Discovery

This project will deliver a new smart, adaptive chemical screening pipeline that uses virtual screening software tools to – at a fraction of the normal screening cost – speed the discovery of drug-like compounds that exhibit some therapeutically relevant activity. Instead of screening tens to hundreds of thousands of compounds, it may be possible to guide the search using computational methods and machine learning. This project, which will be based on computational tools, is expected to revolutionize the chemical screening capabilities at UW-Madison.

Modern society is driven by algorithms that attempt to learn our preferences in books, movies and music. It is possible to define a few key data points that offer great deal of information about a person’s tastes. For example, one can imagine asking a user to rate a few well-known songs from various musical genres. Depending on the user’s ratings, a recommender algorithm might tend toward suggesting jazz to one user and country music to another. This project applies a similar approach to drug discovery. An initial inexpensive screen of “high information” compounds will be used to seed algorithms that learn the ligand binding preferences of a given protein target. Additional compounds will be chosen using both data-driven and physics-driven models, and the process will be iterated until the process homes in on a set of compounds preferred by the target. A small set of highly informative compounds against which all new targets will be screened will be deduced using a recommender system; the results of this initial screen will help classify the target’s likely binding preferences against other compounds.

Several distinct computational models will be developed to aid the experimental search. While computational models have been used to inform drug discovery for decades, and computational models have been built from experimental results, there does not exist a discovery protocol that can adaptively learn from experimental data generated based on its initial predictions. This proposal draws upon the unique strengths of UW-Madison in both data science and high-throughput computing to address the lack of an adaptive, cost-effective drug-screening pipeline, which is of great practical importance both on campus and in the broader research community.

Principal Investigator

Julie Mitchell
Professor of Biochemistry and Mathematics

Co-Principal Investigators

Anthony Gitter
Assistant Professor of Biostatistics and Medical Informatics
Michael Hoffman
Professor of Oncology

Co-Investigators

Michael Newton
Professor of Statistics, Biostatistics and Medical Informatics
Robert Nowak
Professor of Electrical and Computer Engineering
Stephen Wright
Professor of Computer Sciences and Industrial Engineering

Collaborators

Spencer Ericksen
Assistant Scientist at the Small Molecule Screening Facility
Scott Wildman
Associate Scientist at the Small Molecule Screening Facility