October 11th, 2019
Topic: Reproducible data analysis with Snakemake
Presenter: Johannes Köster, Ph.D, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen
Abstract:
Data analyses usually entail the application of many command line tools or scripts to transform, filter, aggregate or plot data and results. With ever increasing amounts of data being collected in science, reproducible and scalable automatic workflow management becomes increasingly important. Snakemake is a workflow management system, consisting of a clean, human-readable, text-based workflow specification language and a scalable execution environment, that allows the parallelized execution of workflows on workstations, compute servers, clusters and the cloud without modification of the workflow definition. Since its publication, Snakemake has been widely adopted and was used to build analysis workflows for numerous high impact publications. With about thousands of homepage visits per month and over 100 new citations in 2019, it has a large and stable user community. This talk will show how Snakemake can be used to easily document, execute, and reproduce data analyses.