Machine Learning for Drug Function Classification A Hands-On Tutorial
Overview: This two-part workshop will introduce you to the concepts and tools in Machine Learning to generate molecular descriptors for drug function classification. You will receive hands-on instruction to generate and explore small molecule (drug-like) chemical structures, compute chemical descriptors, and create and analyze Machine Learning classification models. The workshop will use open source chemoinformatics software and the scikit-learn library to compute key pharma-relevant descriptors and generate/analyze drug classification models.
Part 1: a 30-minute presentation followed by a 20-minute hands-on code/tools review. This includes:
- Introduction to ML concepts to create molecular structures and extract features or chemical descriptors.
- How to generate and analyze molecular fingerprint descriptors
- How to use the following two tools to explore data (chemical) analysis and feature generation:
Part 2: a 30-minute presentation followed by a 20-minute hands-on tools review. We will extend the concepts demonstrated in Part 1 to build machine learning classification models for predicting small-molecule (drug-like) function (ex., CNS, GI Agent, etc.). Tools include:
- Scikit-learn for creating Random Forest classification models
- A modeling workflow that include data collection/curation, featurization (fingerprints), classification modeling using ensemble-based methods and analysis and based on the lessons-learned from AMPL publication
Instructor: Sarangan Ravichandran, PhD, PMP [C], Data Scientist, Frederick National Laboratory for Cancer Research and Adjunct Professor in Bioinformatics, Hood College
Questions? Contact the NCI Data Science Learning Exchange