My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members

Build systems like make are frequently used to create complicated workflows, e.g. in bioinformatics. This project aims to reduce the complexity of creating workflows by providing a clean and modern domain specific specification language (DSL) in python style, together with a fast and comfortable execution environment.

News

  • 15. May 2012: Release 1.1.2 of snakemake. Instead of using only plain strings, input files can now also be defined as functions or lambda expressions that return a string given the wildcards as an argument. Fixed hangups in parallel execution of a lot of jobs.
  • 15. Apr 2012: Maintenance release 1.0.2 of snakemake. Improved temporary file handling and error handling when running snakemake on clusters.
  • 9. Apr 2012: The first stable release (1.0.1) of snakemake.

Features

  • Define workflows in a textual way by writing rules how to create output files from input files in a simple python based syntax. In contrast to GNU make (which is primarily a build system), snakemake allows a rule to create multiple output files.
  • Snakemake automatically calculates which rules need to be executed to create the desired output.
  • Both shell based rules as well as full python syntax inside a rule is supported. Shell commands have direct access to all local and global python variables.
  • Like GNU make, snakemake can schedule parallel rule executions where possible. Further, inter rule parallelization can be combined with intra rule parallelization (e.g. threads) and snakemake ensures that the number of used cores does not exceed a given threshold.
  • Files can be marked as temporary (i.e. they can be deleted once not needed any more) or protected (i.e. they will be write protected after creation).
  • Input and output files can contain multiple named wildcards.
  • Input and output files can be given names to ease addressing them inside the rule.
  • A map-reduce like functionality is accomplished by using the easy to read python list comprehension syntax.
  • As an experimental feature, snakemake can run on a cluster by specifying the submit command (e.g. qsub for Sun Grid Engine).

Installation

  • On Ubuntu 12.04, you can install the Debian package python3-snakemake available in our launchpad repository.
  • On other systems, you need a working installation of Python >= 3.2. Depending on your system, you can then install snakemake by issuing either easy_install snakemake or easy_install3 snakemake in the command line. If you don't have administrator priviledges, have a look at the argument --user of easy_install.
  • Finally, snakemake can be manually installed by downloading the source code archive from pypi.

Usage

Snakemake offers a simple DSL to describe workflows that create files in several subsequent steps:

samples = ["01", "02"]

# optionally define a directory where the work should be done.
workdir: "path/to/workdir"

# similar to make, define dummy rules that act as build targets.
rule all:
        input: "diffexpr.tsv", ...

rule summarize:
        input:  "{sample}.mapped.bam".format(sample = s) for s in samples
        output: "diffexpr.tsv"
        run:
                #... provide some python code to produce the output from the input files
                #e.g. access input files by index
                input[1]
                # access wildcard values
                wildcards.sample
                # easily run shell commands automatically using your default shell while
                # having direct access to all local and global variables via the format
                # minilanguage
                shell("somecommand {input} {output}")

rule map_reads:
        # assign names for input and output files
        input:  reads = "{sample}.fastq", hg19 = "hg19.fasta"
        # mark output files to be write-protected after creation
        output: mapped = protected("{sample}.mapped.sai")
        # Optionally define messages that are displayed instead of generic rule 
        # description on execution of the rule:
        message: "Mapping reads to {input.hg19}"
        # Let snakemake know how many threads you allow for this rule and propagate the 
        # value in the variable "threads" to the shell command below
        threads: 8
        shell:
                # directly provide shell commands (in a multi or single line string) if 
                # python syntax is not needed. again, global and local variables can be 
                # accessed via the format minilanguage.
                # Further, number of threads used by the rule can be specified. The 
                # snakemake scheduler ensures that the rule is run with the specified 
                # number of threads if enough cores are made available via the -j command 
                # line option. 
                """
                bwa aln -t {threads} {input.hg19} {input.reads} > {output.mapped}
                some --other --command
                """

Given a "Snakefile" with such a syntax, the workflow can be executed (e.g. using up to 6 parallel processes) by issueing in the same directory:

$ snakemake -j6

For more details please see the Tutorial

Powered by Google Project Hosting