Data analyses usually entail the application of many command line tools or scripts to transform, filter, aggregate or plot data and results. With ever increasing amounts of data being collected in science, reproducible and scalable automatic workflow management becomes increasingly important. Snakemake is a workflow management system, consisting of a text-based workflow specification language and a scalable execution environment, that allows the parallelized execution of workflows on workstations, compute servers and clusters without modification of the workflow definition. Thereby, a scheduling algorithm based on a multidimensional knapsack problem allows Snakemake to maximize workflow execution speed while not exceeding given constraints like the number of available processor cores, cluster nodes or auxilliary hardware like graphics cards.
Since its publication, Snakemake has been widely adopted and was used to build analysis workflows for a variety of high impact publications. With about 5000 homepage visits per month, it has a large and stable user community.