The presentation and video recording are now available.

Overview: This two-part workshop will introduce you to the concepts and tools in Machine Learning to generate molecular descriptors for drug function classification. You will receive hands-on instruction to generate and explore small molecule (drug-like) chemical structures, compute chemical descriptors, and create and analyze Machine Learning classification models. The workshop will use open source chemoinformatics software and the scikit-learn library to compute key pharma-relevant descriptors and generate/analyze drug classification models.

Part 1: a 30-minute presentation followed by a 20-minute hands-on code/tools review. This includes:

  • Introduction to ML concepts to create molecular structures and extract features or chemical descriptors.
  • How to generate and analyze molecular fingerprint descriptors
  • How to use the following two tools to explore data (chemical) analysis and feature generation:
    • Rdkit libraries, Python’s open source cheminformatics software toolkit
    • Mordred and other open source software to generate molecular features

Part 2: a 30-minute presentation followed by a 20-minute hands-on tools review. We will extend the concepts demonstrated in Part 1 to build machine learning classification models for predicting small-molecule (drug-like) function (ex., CNS, GI Agent, etc.). Tools include:

  • Scikit-learn for creating Random Forest classification models
  • A modeling workflow that include data collection/curation, featurization (fingerprints), classification modeling using ensemble-based methods and analysis and based on the lessons-learned from AMPL publication

Date: Thursday, July 16, 2020
Time: 1:00 – 3:00 p.m.
Location: WebEx
Supporting Link: GitHub

Instructor: Sarangan Ravichandran, PhD, PMP [C], Data Scientist, Frederick National Laboratory for Cancer Research and Adjunct Professor in Bioinformatics, Hood College

Questions? Contact the NCI Data Science Learning Exchange

