April 9th, 2021
Topic: Introduction to cancer data analysis on the ISB-Cancer Gateway in the Cloud
Dr. Fabian Seidl
- Bioinformatics Scientist, ISB-CGC
Abstract: The ISB Cancer Gateway in the Cloud (ISB-CGC) is one of the three Cloud Resources, part of the NCI’s Cancer Research Data Commons ecosystem serving to democratize access to large cancer datasets as well as high-performance compute resources on the Google Cloud Platform. ISB-CGC offers multiple avenues for accessing and analyzing large-scale cancer datasets including TCGA, TARGET, CPTAC and other important references such as GENCODE and COSMIC. ISB-CGC users can analyze petabytes of data using complex workflows written in the language of their choice (including but not limited to CWL, WDL, Snakemake, Nextflow, etc.) They can develop new analysis pipelines using SQL, Python, and R, mine data such as gene expression, protein abundance, and somatic mutations in easily accessible and query-able tables of large-scale cancer data and use interactive web tools designed for cohort creation and data discovery and exploration. Here we will demonstrate how the flexible computing infrastructure of ISB-CGC enables researchers to analyze cloud hosted data with their own tools as well as with a collection of powerful Google Cloud Platform native tools and technologies (including Google BigQuery for big data analysis and Google Compute Engine for complex workflow execution).