April 10th 2020
Topic: Cancer Data Analytics on the ISB-CGC platform
Presenter:
Dr. Kawther Abdilleh, Bioinformatics Scientist, GDIT, ISB-CGC
Dr.Fabian Seidl, Bioinformatics Scientist, GDIT, ISB-CGC
Abstract:
The ISB Cancer Genomics Cloud (ISB-CGC) is one of three Cancer Cloud Resources funded by the National Cancer Institute, serving to democratize access to large cancer datasets as well as high-performance compute resources on the Google Cloud Platform. With a focus on Data as a Service (DaaS), the ISB-CGC offers multiple avenues for accessing and analyzing large-scale cancer datasets including TCGA, TARGET and other important references such as GENCODE and COSMIC. ISB-CGC is intentionally designed as an open platform allowing a wide range of users with diverse skill sets to choose approaches best suited to their tasks at hand. Users can analyze petabytes of data using complex workflows written in the workflow language of their choice (including but not limited to CWL, WDL, Snakemake, Nextflow, etc). They can develop new analysis methods in common languages such as Python, R, and SQL, conduct multivariate data analysis on easily accessible and query-able tables of large-scale cancer data, and use interactive web tools designed for cohort creation and data discovery and exploration. Here we will demonstrate how the flexible computing infrastructure of ISB-CGC enables researchers to analyze cloud hosted data with their own tools as well as with a collection of powerful Google Cloud Platform native tools and technologies (including Google BigQuery for big data analysis and Google Compute Engine for complex workflow execution).