October 9th, 2020
Topic: Tibanna : Flexible and scalable job execution on Amazon Web Services (AWS)
Presenter: Soo Lee, Senior Bioinformatics Scientist, Department of Biomedical Informatics, Harvard Medical School.
Bioinformatics analyses often require a large-scale batch data processing. Efficient monitoring and resource optimization helps with the design and execution of the software pipelines. It is especially important when the pipelines are run on the cloud environment where lack of efficient oversight can easily lead to a many-fold increase in cost. Tibanna was developed to meet this necessity, with the following focus in mind: 1) bringing software to the cloud where data is stored, rather than downloading the data (support for AWS), 2) support for dockerized pipelines described in Common Workflow Language (CWL), Workflow Description Language (WDL) or in a set of plain shell commands, to improve reproducibility and portability of the pipelines, 3) resource allocation, monitoring, cost exploration for individual jobs rather than depending on a pre-configured cluster of machines, and 4) easy integration with other systems. Tibanna is used as the job execution utility by NIH Common Fund 4D Nucleome (4DN) Data Portal, Brigham & Women’s Hospital Genomic Medicine Platform, Harvard Medical School Clinical Genome Analysis Platform. It is available as a stand alone program to directly interact with the AWS cloud. Tibanna has also been incorporated into Snakemake, a widely used workflow management system, as the AWS backend. Tibanna is open source and can be found at https://github.com/4dn-dcic/tibanna. User documentation is available at https://tibanna.readthedocs.io/en/latest/.