NCI Hub - Group: Computational Approaches for Cancer Workshop ~ Ninth Computational Approaches for Cancer Workshop

Discoverability Visible
Join Policy Open/Anyone
Created 10 Aug 2020

Computational Approaches for Cancer Workshop

►

CAFCW23 Program and Schedule

Program

Sunday, November 12, 2023, 2:00 p.m.– 5:30 p.m. Mountain Standard Time

(All times listed are Mountain Standard Time – MST)

Download a PDF version here

2:00 pm – 2:10 pm	Welcome – Ninth Computational Approaches for Cancer Workshop (CAFCW) Sally R. Ellingson, PhD, University of Kentucky Eric A. Stahlberg, PhD, Frederick National Laboratory for Cancer Research
2:10 pm – 2:40 pm	Overcoming the Challenges to Democratizing Precision Medicine: HPC Infrastructure, Health Equity Training Sets, Training a Diverse Workforce, and Mitigating Fears Featured Speaker — James W. Lillard, PhD, MBA, Morehouse School of Medicine Introduced by Eric A. Stahlberg, PhD, Frederick National Laboratory for Cancer Research
2:40 pm - 2:55 pm	Session 1 Patient Scale Computational Approaches for Cancer *AI/ML-Derived Whole-Genome Predictor Prospectively and Clinically Predicts Survival and Response to Treatment in Brain Cancer* Presenter: Orly Alter, PhD, University of Utah Authors: Sri Priya Ponnapalli, PhD, University of Utah; Penelope Miron, PhD, Case Western Reserve University School of Medicine; Kristy L. S. Miskimen, PhD, Case Western Reserve University School of Medicine; Kristin A. Waite, PhD, National Cancer Institute; Nadiya Sosonkina, PhD, HudsonAlpha Clinical Services Lab LLC; Sara E. Coppens, B.Sc., The Abigail Wexner Research Institute at Nationwide Children’s Hospital; Anthony C. Bryan, PhD, The Abigail Wexner Research Institute at Nationwide Children’s Hospital; Estevan P. Kiernan, M.Sc., Illumina, Inc.; Huanming Yang, PhD, Beijing Genomics Institute (BGI) -Shenzhen; Jay Bowen, M.Sc., The Abigail Wexner Research Institute at Nationwide Children’s Hospital; Ghunwa A. Nakouzi, PhD, HudsonAlpha Clinical Services Lab LLC; Jill S. Barnholtz-Sloan, PhD, National Cancer Institute; Andrew E. Sloan, MD, Case Western Reserve University School of Medicine; Tiffany R. Hodges, MD, Case Western Reserve University School of Medicine; Orly Alter, PhD, University of Utah. Abstract: Cancer is complex, with contributing factors distributed across the entire genome affecting every aspect of the disease. But typical artificial intelligence and machine learning (AI/ML) would require 3B-patient training sets to generate predictive models from the whole 3B-nucleotide genome. As a result, tests remain limited to one to a few hundred genes. Prediction continues to rely mostly on such factors as a tumor’s grade and the patient’s age. And the understanding and management of cancer continue to involve guesswork. A genome-wide pattern in glioblastoma brain cancer tumors was experimentally validated in a retrospective clinical trial as the most accurate and precise predictor of life expectancy and response to standard of care [1]. Applicable to the general population, this predictor, the first to encompass the whole genome, and predictors in lung, nerve, ovarian, and uterine cancers, were mathematically (re)discovered and computationally (re)validated in open-source datasets from as few as 50–100 patients by using our AI/ML [2,3]. Data-agnostic, our algorithms, multi-tensor comparative spectral decompositions, extend the mathematics that underlies quantum mechanics to overcome typical AI/ML obstacles by not requiring large amounts of data, balanced data, or feature engineering. All other attempts to connect a glioblastoma patient’s outcome with the tumor’s DNA copy numbers failed. For 70 years, the best indicator has been age. At 75–95% accuracy, our predictor is more accurate than and independent of age and all other indicators. Platform- and reference genome-agnostic, the predictor’s >99% precision is greater than the community consensus of <70% reproducibility based upon one to a few hundred genes. It describes mechanisms fortransformation and identifies drug targets and combinations of targets to sensitize tumors to treatment. Now, in follow-up results from the trial we, first, show correct prospective prediction of the outcome of the five of the 79 patients who were alive four years earlier, at the time of first results. Two patients, who were predicted to have shorter survival, lived less than five years from diagnosis, whereas of the three patients predicted to have longer survival, one lived more than five, and the remaining two are alive >11.5, years from diagnosis. Second, we demonstrate 100%-precise clinical prediction for 59 of the 79 patients with remaining tumor DNA by using whole-genome sequencing in a regulated laboratory. Third, we establish that the risk that a tumor’s whole genome confers upon outcome, as is reflected by the predictor, is surpassed only by the patient’s access to radiotherapy. This is a proof of principle that our AI/ML is uniquely suited for personalized medicine. This also demonstrates that the inclusion of complete genomes, and the normal diversity within, is, beyond fair AI/ML, a scientific, engineering, and medical necessity, because a patient’s survival and response to treatment are the outcome of their tumor’s whole genome. We conclude that our AI/ML-derived whole-genome predictors can take the guesswork out of cancer. [1] Ponnapalli et al., APL Bioeng 4, 026106 (2020); https://doi.org/10.1063/1.5142559 [2] Bradley et al., APL Bioeng 3, 036104 (2019); https://doi.org/10.1063/1.5099268 [3] Alter et al., PNAS 100, 3351 (2003); https://doi.org/10.1073/pnas.053025810
2:55 pm - 3:00 pm	*CAFCW Announcements* Presenter: Eric A. Stahlberg, PhD, Frederick National Laboratory for Cancer Research
3:00 pm – 3:30 pm	Break – Virtual Poster Session Visits *Posters may be viewed starting November 12, 2023* https:// https://cafcw23.virtualpostersession.org/
3:30 pm - 4:00 pm	Panel – Diversity, Equity, and Inclusion – from Data to Workforce Sally R. Ellingson, PhD, University of Kentucky, Moderator Panelists: Todd Burus, MAS, University of Kentucky Sylvia Crivelli, PhD, Lawrence Berkeley National Laboratory James W. Lillard, PhD, MBA, Morehouse School of Medicine Lavanya Vishwanatha, PhD, National Institutes of Health
4:00 pm – 4:15 pm	*Automated Whole-Body Tumor Segmentation and Prognosis of Cancer on PET/CT* Presenter/Author: Kevin H. Leung, PhD, Johns Hopkins University Abstract: Background: Cancer is the second leading cause of death in the United States [1]. Automatic characterization of malignant disease is an important clinical need to facilitate early detection and treatment of cancer [2]. Advances in machine learning (ML) and deep learning (DL) have shown significant promise for radiological and oncological applications [3]. Radiomic analysis extracts quantitative features from radiologic data about a cancerous tumor [4]. DL methods require large training datasets with sufficiently annotated images, which are difficult to obtain for radiological applications. The objective of this study was to develop a deep semi-supervised transfer learning approach for automated whole-body tumor segmentation and prognosis on positron emission tomography (PET)/computed tomography (CT) scans using limited annotations (Fig. 1a). Methods: Five datasets consisting of 1,019 prostate, lung, melanoma, lymphoma, head and neck, and breast cancer patients with prostate-specific membrane antigen (PSMA) and fluorodeoxyglucose (FDG) PET/CT scans were used in this study (Table 1). A nnUnet backbone was cross-validated on the tumor segmentation task via a 5-fold cross-validation. Predicted segmentations were iteratively improved using radiomic analysis. Transfer learning generalized the segmentation task across PSMA and FDG PET/CT. Segmentation accuracy was evaluated on true positive rate (TPR), positive predictive value (PPV), Dice similarity coefficient (DSC), false discovery rate (FDR), true negative rate (TNR), and negative predictive value (NPV). Imaging measures quantifying molecular tumor burden and uptake were extracted from the predicted segmentations. A risk stratification model was developed for prostate cancer by combining the extracted imaging measures and was evaluated on follow-up prostate-specific antigen (PSA) levels. A risk stratification model was developed for head and neck cancer patients by combining imaging measures and American Joint Committee on Cancer (AJCC) staging and was evaluated via Kaplan-Meier survival analysis. A prognostic model was developed to predict pathological response of breast cancer patients toneoadjuvant chemotherapy using imaging measures from pre-therapy and post-therapy PET/CT scans. Prognostic models were evaluated on overall accuracy and area under the receiver operating characteristic (AUROC) curve. Statistically significant differences were inferred using a Wilcoxon rank-sum test. Results: Accuracy metrics and illustrative examples of predicted tumor segmentations are shown in Table 2 and Fig. 1b. The risk stratification model yielded an overall accuracy of 0.83 and an AUROC of 0.86 in stratifying prostate cancer patients (Fig. 1c). Median follow-up PSA levels in the low-intermediate and high risk groups were 1.19 ng/mL and 53.20 ng/mL (P < 0.05). Head and neck cancer patients were stratified into low, intermediate, and high risk groups with significantly different Kaplan-Meier survival curves by the log-rank test (Fig. 1d). A prognostic model using imaging measures from pre-therapy scans predicted pathological complete response (pCR) in breast cancer patients with an accuracy of 0.72 and an AUROC of 0.72. The model using imaging measures from both pre-therapy and post-therapy scans predicted pCR in breast cancer patients with an accuracy of 0.84 and an AUROC of 0.76. Conclusion: A deep semi-supervised transfer learning approach was developed and demonstrated accurate tumor segmentation, quantification, and prognosis on PET/CT of patients across six cancer types.
4:15 pm — 4:30 pm	*Optimized Patient-Specific Catheter Placement for Convection-Enhanced Nanoparticle Delivery in Recurrent Glioblastoma* Presenter/Author: Chengyue Wu, PhD, University of Texas, Oden Institute Abstract: Introduction: Glioblastoma multiforme (GBM) is the most common and deadliest of all primary brain cancers. One promising treatment strategy for patients with recurrent GBM is convection-enhanced delivery (CED) of Rhenium-186 (186Re)-nanoliposomes (RNL) to provide delivery of large, localized doses of radiation. The success of treatment by CED relies on proper catheter placement for therapy delivery to maximize tumor coverage and minimize the leakage to healthy tissue. In this project, we are developing an image-guided physics-based model to optimize catheter placement for RNL delivery on a patient-specific basis. Methods: The mathematical model consists of 1) the steady-state flow field generated via the catheter infusion and the Darcy flow through the 3D brain domain, 2) the transport of RNL governed by an advection-diffusion equation, and 3) the point-spread function to transform the RNL distribution into the SPECT signal. Pre-delivery MRIs were used to assign patient-specific tissue geometries. Two scenarios were performed to personalize the model parameters: a) patient-specific calibration with longitudinal SPECT images monitoring RNL distributions, and b) population-based assignment with the leave-one-out cross-validation (LOOCV). The accuracy of model predictions was evaluated by the concordance correlation coefficients (CCC) between predicted and measured voxel-wise SPECT signals. Furthermore, in each patient, we used the image-guided model—with either calibrated or assigned parameters—to simulate RNL distributions for all possible locations of catheter tip(s), resulting in a ratio of the cumulative dose of RNL outside the tumor to that within the tumor, termed as “off-target ratio” (OTR). We minimized the OTR to optimize the placement of catheter(s) and compared OTRs obtained by the optimized and the original placements. Results: Fifteen patients with recurrent GBM from a Phase I/II clinical trial of RNL were included in the study. For scenario a) with the patient-specific calibrated parameters, our model achieved median CCCs of 0.91, 0.87, and 0.82 for predicting RNL distributions at the mid-delivery, end-of-delivery, and 24 h post-delivery, respectively. For scenario b) with the LOOCV assigned parameters, our model achieved median CCCs of 0.89, 0.84, and 0.79 for predicting RNL distributions at the mid-delivery, end-of-delivery, and 24 h post-delivery, respectively. Compared to the original catheter placements, the optimized placements with the patient-specifically calibrated model achieved a median (range) of 34.56% (14.70% – 61.12%) reduction on OTR at the 24h post-delivery. Similarly, the optimized placements with the LOOCV assigned model achieved a 34.56% (13.30% – 56.62%) reduction on OTR at the 24h post-delivery. Furthermore, the optimization provides insights into whether a patient is a proper candidate for CED of RNL, and whether a reduction of catheter number is possible for thepatient. Conclusion: Our image-guided model, with either patient-specific calibrated parameters or LOOCV assigned parameters, achieved high accuracy for predicting RNL distributions up to 24 h after the RNL delivery. The placement of catheter(s) optimized via our modeling substantially reduced the off-target ratio of RNL delivery. These results proved the potential of our image-guided modeling to guide patient-specific optimization of catheter placement for convection-enhanced delivery of radiolabeled liposomes. Acknowledgments: NCI R01CA235800, U01CA253540, and R01CA260003. CPRIT RR160005.
4:30 pm – 4:45 pm	CAFCW23 Student Presentation* *Environmental Factors and Lung Cancer: A Predictive Spatial Approach* Presenter: Wenhuan Tan, BS Authors: Wenhuan Tan, BS, Xiange Wang, BS, Silvia Crivelli, PhD, Xinlian Liu, PhD Abstract: Lung cancer has witnessed a substantial increase in prevalence over the past few decades. While studies have established that the environment is the primary cause of most lung cancer cases, the development of lung cancer may be the result of the combined impact of multiple environmental factors. Our objective is to investigate the relationship between lung cancer incidence and various physical ambient factors, including climatology, air quality, meteorological conditions, and soil vegetation. To address the issue of missing data on lung cancer cases at the county level in 2020, we use a Bayesian spatial and temporal modeling approach to mapping geographic variation in lung cancer mortality rates for subnational areas with R-INLA. Our predictive model is constructed using multiple independent variables obtained from various satellite sources, covering the period from 1960 to 2016. Climate data such as heatwaves and extremetemperatures are from the National Oceanic and Atmospheric Administration (NOAA) and PRISM climate group (PRISM). Air quality indicators such as PM2.5, NO2, and ozone levels are sourced from NASA's Earth Data. Observational meteorological data, encompassing temperature, dew point, wind direction, wind speed, cloud cover, cloud layers, ceiling height, visibility, current weather, and precipitation amount, are obtained from the EPA's high-resolution gridded dataset. Soil vegetation and cropland data are acquired from the United States Department of Agriculture (USDA) using satellite imagery. Furthermore, we explore additional geophysical data available through the Google Earth Engine platform. Our predictive model reveals an increasing positive association between multiple environmental factors and lung cancer incidence over the years. We apply a linear model with group fixed effects to 2012-2016 data, assessing lung cancer's relative risk and generating a 2017-2021 environmental vulnerability map. This work highlights AI and integrated data analysis' potential in interpreting and predicting complex health phenomena like lung cancer. *The CAFCW Student Presentation is dedicated to the memory of Petrina Chong Hollingsworth, who, as a member of the CAFCW Organizing Committee, was passionate about engaging students and who helped create the Student Track for CAFCW20. Sadly, we lost Petrina in July 2023 due to a rare form of cancer. As we continue the CAFCW tradition for the student presentation, we honor her memory, her enthusiasm, and her devotion.
	Session 2 Molecular Scale Computational Approaches for Cancer
4:45 pm – 5:00 pm	*Constructing a large-scale biomedical knowledge graph and its applications in drug discovery* Presenter/Author: Jinfeng Zhang, PhD Abstract: In the past few decades, the biomedical research community has acquired a wealth of knowledge, much of which is stored in scientific literature as unstructured text. Converting this text into structured form is crucial for developing new methodologies and applications that can fully utilize this knowledge. To achieve this goal, two basic problems must be addressed: named entity recognition (NER) and relation extraction (RE). NER involves identifying the concepts or entities in texts, such as diseases, genes/proteins, and chemical compounds. RE, on the other hand, aims to extract the relationships between these entities. The information extracted fromNER and RE can be used to create knowledge graphs, where nodes represent entities in the text and edges represent their relationships. This presentation will discuss our team's work on the LitCoin NLP Challenge organized by NIH, for which we were awarded first place. Using pipelines developed for the challenge, we processed all PubMed articles and created a large-scale biomedical knowledge graph. The accuracy of this large-scale relation extraction isestimated using manual verification of a sample of the extracted data and found to be at the human annotation level. We also incorporated relation information from 40 public databases and relations inferred from publicly available genomics datasets. Our knowledge graph consists of over 11 million entities and more than 40 million relations. We have developed versatile query functions and knowledge discovery tools for accessing and mining structured data in the knowledge graph. Finally, we will discuss some drug discovery-relatedapplications enabled by this large-scale knowledge graph.
5:00 pm – 5:15 pm	*Entropy-Based Regularization on Deep Learning Models for Anti-Cancer Drug Response Prediction* Presenter: Oleksandr Narykov, PhD Authors: Oleksandr Narykov, PhD, Argonne National Laboratory (ANL), Data Science and Learning Division; Yitan Zhu, PhD, Argonne National Laboratory (ANL), Data Science and Learning Division; Thomas Brettin, MS, Argonne National Laboratory (ANL), Data Science and Learning Division; Yvonne Evrard, PhD, Frederick National Laboratory for Cancer Research; Alexander Partin, PhD, Argonne National Laboratory (ANL), Data Science and Learning Division; Maulik Shukla, MS, Argonne National Laboratory (ANL), Data Science and Learning Division; Priyanka Vasanthakumari, PhD, Argonne National Laboratory (ANL), Data Science and Learning Division; James Doroshow, MD, National Cancer Institute; Rick Stevens, Argonne National Laboratory. Abstract: This work studies a particular setting for regression problems – tasks with complex combinatorial data space where samples can be divided into distinct groups. Anti-cancer drug response prediction is a perfect example of this setting, in which each sample includes cancer biological features and drug chemical information. Many existing works of pan-drug and pan-cancer response modeling treat different combinations of drugs and cancers as individual samples. A potential problem in these works is that a model may be heavily influenced and biased toward overrepresented drugs and cancers. In the drug response prediction field, the performance of pan-cancer pan-drug models is commonly evaluated on a holdout test set through cross-validation (CV) using performance metrics like thecoefficient of determination (R2) and the mean squared error (MSE). However, drug response prediction can be viewed as a multi-objective optimization task, attempting to maximize the prediction performance over different drugs and cancers. We consider the performance for each drug as a separate objective and attempt to find a model on the corresponding Pareto front that provides balanced performances for all compounds. We propose adding an entropy-based regularization to the loss function for model training to reach this balanced state. The intuition behind it is straightforward – we minimize the MSE on all data points while encouraging the drug-specific model fitting error variability to stayas low as possible. We achieve this by grouping samples by their drug identity and computing the MSE for each group in the training batch. Then we calculate entropy over normalized group-specific losses. This value is plugged into the regularization term that incentivizes the loss function to maximize it. The maximum entropy can be achieved when the MSEs across all drugs take the same value. We investigate the regularization effect on response modeling using a drug screening dataset of Cancer Cell Line Encyclopedia (CCLE) and a state-of-the-art drug response prediction model DeepTTA [1]. We consider two CV strategies for model evaluation – random split and drug-blind split. In a random split, the testing set can share both cell lines and drugs with the training set, while in a drug-blind split, a drug can not appear simultaneously in both the training and testing sets. We perform 10-fold CV analyses and evaluate the model performance using R2. For the random split, we see that adding the entropy-based regularization leads to a marginal improvement in prediction performance, which is 0.736 without regularization versus 0.746 with regularization and a p-value of 0.130 from the pairwise t-test. Importantly, we observe a substantial improvement in the more challenging setting of drug-blind split, where the prediction performance increases from -0.128 (without regularization) to 0.168 (with regularization) with a statistically significant p-value of 0.005 from pair-wise t-test. 1 Jiang, L., Jiang, C., Yu, X., Fu, R., Jin, S., and Liu, X.: ‘DeepTTA: a transformer-based model for predicting cancer drug response’, Briefings in Bioinformatics, 2022, 23, (3), pp. bbac100
5:15 pm – 5:30 pm	Scalable Lead Prediction with Transformers using HPC resources Presenter: Archit Vasan, PhD, Argonne National Laboratory Authors: Archit Vasan, PhD, Argonne National Laboratory; Rick Stevens, Argonne National Laboratory; Arvind Ramanathan, PhD, Argonne National Laboratory; Venkatram Vishwanath, PhD, Argonne National Laboratory. Abstract: A promising direction in cancer drug discovery is high-throughput screening of extensive compound datasets to identify advantageous properties, including their ability to interact with relevant biomolecules such as proteins. However, traditional structural approaches for assessing binding affinity, such as free energy methods or molecular docking, pose significant computational bottlenecks when dealing with such vast datasets. To address this, we have developed a docking surrogate called the SMILES transformer (ST), which learns molecular features from the SMILES representation of compounds and approximates their binding affinity. SMILES data is first tokenized using a well-established SMILES-pair tokenizer and fed into a BERT-like Transformer model to generate vector embeddings for each molecule, effectively capturing the essential information. These extracted embeddings are then fed into a regression model to predict the binding affinity. Leveraging the high-performance computing resources at Argonne National Lab, we devised a workflow to scale model training and inference across multiple supercomputing nodes. To evaluate the performance and accuracy of our workflow, we conducted experiments using molecular docking binding affinity data on multiple receptors, comparing ST with another state-of-the-art docking surrogate. Impressively, both surrogates yielded comparable val-r2 measurements of between 70 and 90%, affirming the capability of ST to learn molecular features directly from language-based data. Furthermore, one significant advantage of the ST approach is its notably faster tokenization preprocessing compared to the alternative method, which requires generating molecular descriptors using Mordred. Our workflow facilitated screening of ~ 3 billion compounds on 48 nodes of the Polaris supercomputer in approximately an hour. In summary, our approach presents an efficient means to screen extensive compound databases for potential molecular properties that could serve as lead compounds targeting cancer. Looking ahead, an important future direction for our workflow involves integrating de-novo drug design, enabling us to scale our efforts to explore the limits of synthesizable compounds within chemical space.
5:30 pm	Workshop Concludes

Featured Speaker:

James W. Lillard, PhD, MBA, Morehouse School of Medicine, Senior Associate Dean, Research, Innovation and Commercialization

James Lillard is a Distinguished Fellow of AAI and Fellow of NAI and AAAS. His research involves dissecting the molecular mechanisms of cancer and inflammatory diseases, using clinically annotated NGS data and the implementation of precision medicine. Dr. Lillard is the Senior Associate Dean of Research, Innovation and Commercialization and Professor at Morehouse School of Medicine (MSM), one of Atlanta's top research institutions. This free-standing, community-based medical school is one of only four Historically Black Medical Schools with enrollment of nearly 1,000 graduate students in Medicine, Public Health, Biomedical Sciences, and Health Informatics and boasts the #1 Ranked online MS in Biotechnology program in the nation. With a mission that includes increasing the diversity of the health professional and scientific workforce, and health equity, MSM attracts, educates, trains, and fosters talent that oftentimes approaches scientific and healthcare hurdles with unique perspectives that ultimately lend to unique solutions that would otherwise be dismissed.

Virtual Posters

Posters can be viewed at https://cafcw23.virtualpostersession.org/

ATOM Summer Trainee, J. Jedediah Smith, MS

Building a Model of the Two Main Mechanisms Involved in CTL-mediated Tumor Cell Death, Darlington S. David, PhD

Building an Online Interactive Volumetric Surface Viewer To Visualize The Spatial Distribution of Brain Metastases, William Delery, BA

Chimeric Antigen Receptor T-Cell Treatment Outcome Prediction for Diffuse Large B-Cell Lymphoma Patients, Melike Sirlanci, PhD, David J. Albers, PhD, Clayton Smith, MD, Jennifer Kwak, MD, Tellen D. Bennett, MD, MS, Steven M. Bair, MD

ClinicalUnitMapping.Com Takes a Small Step Towards Machine Comprehension of Clinical Trial Data, Jacob Barhak, PhD, Joshua Schertz

Comparison of Neural Networks with Tree-based Machine Learning Approaches for Predictive Drug Response Models, Vineeth Gutta, MS, Satish Ranganathan, PhD, Sara Jones, MS, Mattew Beyers, MS, and Sunita Chandrasekaran, PhD

Data Modeling and Analytics Towards Patient Selection For Cancer EOL Care Study. Denise Davis, PhD

Diabetes and the Social Link to Cancer, Victoria Conerly

Enhancing Authenticity in Cancer-Related Information Retrieval Using Retrieval Augmented Generation LLM Framework, Ashish Mahabal, PhD, Asitang Mishra, Kristen Anton, Maureen Ryan Colbert, Sean Kelly, Heather Kincaid, Daniel Crichton

Evaluating Algorithmic Bias on Triple Negative Breast Cancer Data in Six SEER Registries, Jordan Tschida, Mayanaka Chandrashek, Zachary Fox, Alina Peluso, Charles Wiggins Antoinette M. Stroup, Stephen M. Schwartz, Eric B. Durbin, Xiao-Cheng Wu, Heidi A. Hanson

Finding Novel Drug Discovery Experiments with QSAR, Daniel Salinas Duron, PhD, John Marinelli, BS, Thomas Passaro, Rida Saifullah,

Fine tuning a large language model for cancer research, Aarti Venkat, PhD

Hyper-Parameter Optimization for deep-learning models predictive of anti-cancer drug responses, Rylie Weaver, Rohan Gnanaolivu, Chen Wang, Rajeev Jain, Oleksandyr Narykov, Justin Wozniak,

Improving scalability and data access equity with cloud, Marissa Powers, PhD, Aniket Deshpande, MS, Nelson Gonzalez, BA

Influencing factors on false positive rates when classifying tumor cell line response to drug treatment, Priyanka Vasanthakumari, PhD, Thomas Brettin, MS, Yitan Zhu, PhD, Hyunseung Yoo, Maulik Shukla, Alexander Partin, PhD, Fangfang Xia, PhD, Oleksandr Narykov, PhD, Rick L. Stevens, PhD

Interpretable Deep Learning Identifies a Constellation of Molecular Assemblies to Predict Resistance to Replication Stress, Akshat Singhal, Xiaoyu Zhao, Trey Ideker

Large-scale cross-study and learning curve analyses of anti-cancer drug response prediction models, Alexander Partin, PhD, Thomas Brettin, MS, Yitan Zhu, PhD, Andreas Wilke, Oleksandr Narykov, PhD, Priyanka Vasanthakumari, PhD, Jamie Overbeek, PhD, Sara Jones, MS, Justin Wozniak, PhD, Rajeev Jain, Jamaludin Mohd-Yusof, PhD, Cristina Garcia-Cardona, PhD, Michael Weil, PhD, Jeffrey Hildesheim, PhD, Rick Stevens

Pantranscriptome Correlation Network Analysis of High Grade Serous Ovarian Cancer Tumor Immune Microenvironment, Kaylin Carey, MS, Andrea Mariani, MD, MS, Corey Young, PhD, James Lillard, PhD, MBA

PieVal, an Open-Source, Efficient, Secure, Gamified, Rapid Document Classification Annotation Tool, Albert Riedl, MS, Aaron Rosenberg, MD, JP Graff, DO, Matthew Renquist, MS, Joseph Cawood, Nicholas Anderson

Predicting HOMO-LUMO Gap Values Using Advanced Machine Learning Platform for Drug Discovery: ATOM Modeling PipeLine (AMPL). Renate Toldo, Sarah Norris, Justin Overhulse, PhD

Prediction of key metastatic genes in head and neck squamous cell carcinoma using a deep learning based context-aware foundation model for network biology, Tarak Nandi, PhD, Christina Theoderis, MD, PhD, Ravi Madduri, Alex Rodriguez

Quantum-Assisted Prediction of Pharmacokinetic Parameters for Plant-Based Small Molecules Targeting Cancer Protein using ATOM Modeling PipeLine (AMPL), Amit Saxena, MS

The Reference Model for COVID-19 attempts to explain USA data, Jacob Barhak, PhD

Towards Physiology and Synthesis-Informed Generative Modeling in Drug Discovery. Nolan English, PhD, Belinda Akpa, PhD, Zach Fox, PhD

Transformer Based Reinforcement Learner for Dynamic Cancer Treatment, Sarang Gawane, Xinhua Zhang, PhD, Guadalupe Canahuate, PhD, G. Elisabeta Marai, PhD, Andrew Wentzel, PhD, Clifton Fuller, Mohamed Naser, Elisa Tardini, Lisanne Van Djik, Abdallah Mohammed

Thank you to our CAFCW23 Program Committee:

Boris Aguilar, PhD, Institute for Systems Biology

Orly Alter, PhD, University of Utah

Marian Anghel, PhD, Los Alamos National Laboratory

Jane Bai, PhD, Food and Drug Administration

Kristy Brock, The University of Texas MD Anderson Cancer Center

Jeffrey (Jeff) Buchsbaum, MD, PhD, National Cancer Institute

Caroline Chung, The University of Texas MD Anderson Cancer Center

Michael Difilippantonio, PhD, National Cancer Institute

Sally R. Ellingson, PhD, University of Kentucky

Fernanda Foertter, PhD, Voltron Data

James Glazier, PhD, Indiana University

Ryuji Hamamoto, PhD, National Cancer Center Japan/RIKEN, Tokyo

Sean E. Hanlon, PhD, National Cancer Institute

David Hormuth, PhD, The University of Texas at Austin

Florence Hudson, Northeast Big Data Innovation Hub at Columbia University in the City of New York

Ai Kagawa, PhD, Brookhaven National Laboratory

Patricia Kovatch, PhD, Icahn School of Medicine at Mt. Sinai

Ho-Joon Lee, PhD, Yale School of Medicine

Ernesto Lima, PhD, The University of Texas at Austin

Stephen Litster, PhD, AWS

Divya Nagaraj, MS, Stanford University

Amanda Paulson, PhD, University of California, San Francisco

William G. Richards, PhD, Brigham and Women’s Hospital, Dana-Farber/Harvard Cancer Center

Hsun-Hsien Shane Chang, PhD, Novartis

Gundolf Schenk, PhD, University of California, San Francisco

Ilya Shmulevich, PhD, Institute for Systems Biology

Amber Simpson, PhD, Queens University

Eric A. Stahlberg, PhD, Frederick National Laboratory for Cancer Research

Thomas Steinke, PhD, Zuse Institute Berlin

Georgia (Gina) Tourassi, PhD, Oak Ridge National Laboratory

Thomas Yankeelov, PhD, The University of Texas at Austin

CAFCW23 Organizing Committee:

Eric Stahlberg, PhD, Frederick National Laboratory for Cancer Research

Sean Hanlon, PhD, National Cancer Institute

Sally Ellingson, PhD, University of Kentucky

Patricia Kovatch, BS, Icahn School of Medicine, Mount Sinai

*Petrina Hollingsworth, MA/MBA, Frederick National Laboratory for Cancer Research

Lynn Borkon, AB, Frederick National Laboratory for Cancer Research

Presenters

Sally R. Ellingson, PhD, University of Kentucky

Sally Ellingson is a computational scientist working at the intersection of computational biology and high performance computing. She has undergraduate degrees in computer science and mathematics from Florida Institute of Technology. She obtained her doctoral degree at the University of Tennessee and Oak Ridge National Laboratory under a fellowship funded by the National Science Foundation in computational biology. She is an assistant professor in the Division of Biomedical Informatics at the University of Kentucky College of Medicine. In her additional role as the manager for High Performance Computing Services for the Markey Cancer Center’s Cancer Research Informatics Shared Resource Facility, she facilitates high throughput genomics and big data processing for precision medicine resulting in targeted cancer therapies. Recently she has been combining simulations, big data, and machine learning to increase the accuracy of drug binding predictions. With her passion for high performance computing, her research goals lie in harnessing computational power for discoveries otherwise not possible in biomedical areas of high societal importance.

Dr. Ellingson engages in mentoring and outreach, especially for underrepresented groups in computational sciences. She has been actively engaged in the organization of the Students@SC program at Supercomputing since 2014. She has helped organize the Broader Engagement program, spent several years helping with the Student Volunteer program, ran the Mentor-Protégé program, organized Student Programming and School Tours. This year she is the Vice and Deputy Chair for Students@SC.

Eric A. Stahlberg, PhD, Frederick National Laboratory for Cancer Research

Eric A. Stahlberg, Ph.D., is director of Cancer Data Science Initiatives at the Frederick National Laboratory for Cancer Research (FNLCR) where he spearheads efforts to advance the application of predictive modeling for health and wellbeing with a particular emphasis on digital twin approaches. In October 2023, he co-organized the Virtual Human Global Summit in New York City with Brookhaven National Laboratory and University College London and co-presented a panel on the Virtual Human at the Sixth Annual Northwell Health Innovation Summit.

Dr. Stahlberg joined FNLCR in 2011 to launch and direct the bioinformatics core for the NCI Center for Cancer Research. In 2014 he led a new NCI Center for Biomedical Informatics and Information Technology-supported initiative to accelerate cancer research through applications of high performance computing (HPC). Working collaboratively with NCI leadership, Dr. Stahlberg helped establish the Joint Design of Advanced Computing Solutions for Cancer partnership between NCI and the US Department of Energy. In addition, he served on the governance board and helped guide formation of Accelerating Therapeutics for Opportunities in Medicine (ATOM), an innovative public-private consortium to dramatically increase the discovery of new drugs and development of treatments for diseases. In partnership with colleagues from Argonne National Laboratory and NCI, Stahlberg helped establish Innovative Methodologies and New Data for Predictive Oncology Model Evaluation (IMPROVE).

Dr. Stahlberg has a significant and sustained track record of expanding frontiers and collaborative opportunities to advance discovery in life science using data-intensive computing. He has made significant contributions to the HPC field and is passionate about standards-enabled innovation. With over 50 publications, Stahlberg has developed state-of-the-art bioinformatics software, and novel solutions for data management, AI model management, and information delivery to help transform clinical approaches and patient impact. Currently he serves on the technical steering committee for OpenFL, an Intel-supported cross industry initiative to develop open standards for federated machine learning. Dr. Stahlberg was one of the original co-founders of the Computational Approaches for Cancer Workshop in 2015. In 2017, he was recognized as one of FCW’s Federal 100. He holds a PhD in computational chemistry from The Ohio State University and bachelor’s degrees in computer science, chemistry, and mathematics.

Todd Burus, MAS, Markey Cancer Center

Todd Burus, MAS, is the Data Visualization Specialist for the Community Outreach and Engagement office at Markey Cancer Center. He serves as the lead developer on Markey's Cancer InFocus project, working to improve cancer surveillance data gathering and dissemination processes for US cancer centers. Mr. Burus is also an Epidemiology and Biostatistics PhD candidate at the University of Kentucky with a focus on advanced statistical methods for cancer surveillance.

Sylvia Crivelli, PhD Lawrence Berkeley National Laboratory

Dr. Crivelli is a Staff Scientist at Lawrence Berkeley National Laboratory and co-Principal Investigator of an interagency collaboration between the Department of Energy (DOE) and the Department of Veterans Affairs (VA) called MVP CHAMPION (Million Veterans Program: Computational Health Analytics for Medical Precision to Improve Outcomes Now). Her team focuses on developing personalized diagnostic strategies to enhance healthcare for Veterans, with specific attention to suicide prevention, obstructive sleep apnea, and lung cancer. Her research primarily revolves around two key areas: 1) utilizing large language models (LLMs) like GPT to extract information and detect patterns from electronic health records spanning 20 years and more than 23 million Veterans, and 2) integrating social and environmental factors that influence health into comprehensive disease models. These models incorporate various types of data, ranging from genomic information to societal-level factors, with the aim of advancing precision medicine and identifying geographic regions with higher vulnerabilities.

Lavanya (Lavi) Vishwanatha, M.S., PMP, University of North Texas Health Science Center

Lavi Vishwanatha is the co-investigator and Research Enterprise Solutions Director for the Artificial Intelligence and Machine Learning to Advance Health Equity and Researcher Diversity (AIM-AHEAD) program at the University of North Texas Health Science Center at Fort Worth. In her role as a co-investigator and Director, Lavi coordinates all the project activities of the program and the administrative core. She oversees the programs, integration, accomplishments, and reporting of the functional cores as well as the Leadership core of the AIM-AHEAD Coordinating Center.

Lavi Vishwanatha received her M.S. in Computer Science and Engineering from University of Nebraska, Lincoln, Nebraska. She graduated with B.S. in Electrical Engineering from Manipal Institute of Technology, Manipal, India. She is a certified Project Management Professional from PMI and has additional certification in ITIL and V3.

Lavi has extensive experience in project and program management, in the areas of program/project planning & scheduling, conducting risk analysis, stakeholder analysis and scope management. She has extensive hands-on experiences with data and IT security and working with diverse and geographically dispersed program/project professionals. She has managed programs and projects that perpetuated change in the environment. And more importantly ensured that project teams understand the intent and purpose of the change policies applied, so as to help facilitate the enforcement/adoption of change processes across the Enterprise and nationwide. Lavi brings a plethora of diverse knowledge and experiences to strategic nationwide programs such as AIM-AHEAD.

Jinfeng Zhang, PhD, Florida State University

Dr. Zhang received his B.S. in chemistry from Peking University, Beijing China in 1997. He then obtained his M.S. in chemistry, M.S. in computer science, Ph.D. in bioinformatics from the University of Illinois at Chicago in 2001, 2002, and 2004, respectively. He had postdoc training at the Department of Statistics of Harvard University from 2004 to 2007. After that, he has been working as a faculty member at the Department of Statistics of Florida State University. He took a leave of absence in 2022 to work full-time for his startup, Insilicom LLC. Dr. Zhang's research has been focused on computational structural biology, genomics and epigenomics data analysis, and biological natural language processing and text mining. At Insilicom, he and his team are working on applying cutting-edge AI technologies for information extraction, integration, and automated knowledge discovery using knowledge graphs.

Orly Alter, PhD, University of Utah

Orly Alter is a USTAR associate professor of bioengineering and human genetics at the Scientific Computing and Imaging Institute and the Huntsman Cancer Institute at the University of Utah, a 2023 co-chair of the National Cancer Institute Physical Sciences in Oncology Network steering committee, and the chief technology officer and a co-founder of Eigengene, Inc. Alter received her Ph.D. in applied physics at Stanford University and her B.Sc. magna cum laude in physics at Tel Aviv University. Her Ph.D. thesis, which was published by Wiley, is recognized as crucial to quantum computing and gravitational wave detection.

Inventor of the “eigengene,” Alter develops artificial intelligence and machine learning (AI/ML) to compare and integrate datasets of any number, dimensions, and sizes. She demonstrated that her data-agnostic algorithms, the multi-tensor comparative spectral decompositions, discover mechanistically interpretable and clinically actionable whole-genome predictors of survival and response to treatment, applicable to the general population from as few as 50–100 patients, and her platform- and reference genome-agnostic predictors outperform all others, where they exist. Her retrospective clinical trial experimentally validated a genome-wide pattern in brain cancer tumors as the most accurate and precise predictor of life expectancy and response to standard of care. All other attempts to link a patient’s outcome with the tumor’s DNA copy numbers failed. For 70 years, the best indicator has been age. She discovered the brain cancer predictor, and predictors in lung, nerve, ovarian, and uterine cancers, in open-source datasets, proving that her AI/ML is uniquely suited to personalized medicine.

Kevin H. Leung, PhD, Johns Hopkins University

Dr. Kevin H. Leung is a Research Associate and faculty member at the Johns Hopkins Medicine Department of Radiology and Radiological Science. Dr. Leung completed his Ph.D. in Biomedical Engineering at the Johns Hopkins University School of Medicine and is an expert in developing artificial intelligence and deep learning systems for improved clinical outcomes. Dr. Leung has academic and industry experience in developing artificial intelligence and machine learning algorithms for clinically relevant use cases and is the lead inventor of multiple issued and pending patents on artificial intelligence-based technologies for medical applications.

Chengyue Wu, PhD, The University of Texas at Austin, Oden Institute for Computational Engineering and Sciences

Chengyue Wu received her B.S. in Biosciences from the University of Science and Technology of China in 2016, and her Ph.D. in Biomedical Engineering from The University of Texas at Austin in 2020. Dr. Wu is currently a Postdoctoral Fellow at the Oden Institute for Computational Engineering and Sciences at The University of Texas at Austin. Her research interests and goals span the interdisciplinary field of computational oncology, precision medicine, and biomedical imaging. Her predoctoral research focused on clinical-computational approaches to improve breast cancer diagnosis. Currently, at the Oden Institute, she has been working as a primary investigator in multiple projects on image-based modeling to promote cancer diagnosis, treatment response prediction, and personalized treatment optimization. She was recently awarded the prestigious H. D. Landahl Mathematical Biophysics Award from the Society of Mathematical Biology for her contributions to both diagnosis and treatment response prediction in breast cancers.

Oleksandr Narykov, PhD, Argonne National Laboratory

Oleksandr Narykov is a postdoctoral appointee in the Data Science and Learning Division at Argonne National Laboratory, USA. His research focus is artificial intelligence and its applications in biomedical sciences. Current research focus includes anti-cancer drug response prediction and generative learning for transcriptomics data. Another topic of interest is an alternative splicing regulation. He is part of a collaborative project with the National Institute of Health on studying drug response in patient-derived xenografts. Before joining Argonne, he worked on diverse bioinformatics applications for alternative splicing study, sequencing, structural biology, and molecular dynamics simulation. He received B.Sc. and M.Sc. degrees from the National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" (NTUU KPI) in 2013 and 2015, respectively, and a Ph.D. degree from Worcester Polytechnic Institute, MA, USA, in 2022.

Archit Vasan, PhD, Argonne National Laboratory

I am a postdoctoral appointee at Argonne National Laboratory with a background in computational biophysics. My research interests at ANL involve the discovery of cancer drugs using machine Learning coupled to exascale computing.

I received a BA in Physics and Mathematics from Austin College in 2016 and my PhD in Biophysics from the University of Illinois at Urbana-Champaign in 2023 under the guidance of Dr. Emad Tajkhorshid.