March 12th, 2021
Topic: Introduction to the NCI's Cancer Research Data Commons (CRDC) and Cloud Resources (CRs)
- Biomedical Informatics Specialist and Project Manager, CBIIT, NCI
Abstract: More than ever before, scientific progress in biological research hinges on our ability to analyze rich datasets and to draw insights and meaningful interpretations of that data to better understand how disease occurs and how best to treat it. The NCI Cancer Research Data Commons (CRDC) is a cloud-based data science infrastructure that connects data sets with analytics tools to allow users to share, integrate, analyze, and visualize cancer research data to drive scientific discovery. NCI Cancer Research Data Commons provides access to large-scale datasets such as The Cancer Genome Atlas (TCGA) that span multiple data types, including genomics, proteomics, imaging, and clinical data. CRDC has a number of components which include domain-specific data repositories, a centralized Authentication and Authorization mechanism and three cloud based analytical platforms. These cloud platforms offers great computational capacity to manage big data analysis, where users can share, mange, upload their data and also use a number of ready-to-use analytical tools and workflows on the cloud using the GUI, APIs or toolkits, thus eliminating the need to download and store large-scale datasets. Users can also build custom tools and workflows using container technology and workflow management systems. CRDC also comprises of data harmonization and data aggregator components which make it possible to connect different data types and perform integrative analysis, supported by data models, metadata, and terminology.
This webinar will provide an overview of NCI CRDC and the NCI Cloud Resources’ platforms (Seven Bridges’ Cancer Genomics Cloud SB-CGC, Broad Institute’s FireCloud, and Institute for Systems Biology’s Cancer Genomics Cloud ISB-CGC). https://datascience.cancer.gov/data-commons