|
Project Information
|
Build systems like make are frequently used to create complicated workflows, e.g. in bioinformatics. This project aims to reduce the complexity of creating workflows by providing a clean and modern domain specific specification language (DSL) in python style, together with a fast and comfortable execution environment. News
Features
Installation
UsageSnakemake offers a simple DSL to describe workflows that create files in several subsequent steps: samples = ["01", "02"]
# optionally define a directory where the work should be done.
workdir: "path/to/workdir"
# similar to make, define dummy rules that act as build targets.
rule all:
input: "diffexpr.tsv", ...
rule summarize:
input: "{sample}.mapped.bam".format(sample = s) for s in samples
output: "diffexpr.tsv"
run:
#... provide some python code to produce the output from the input files
#e.g. access input files by index
input[1]
# access wildcard values
wildcards.sample
# easily run shell commands automatically using your default shell while
# having direct access to all local and global variables via the format
# minilanguage
shell("somecommand {input} {output}")
rule map_reads:
# assign names for input and output files
input: reads = "{sample}.fastq", hg19 = "hg19.fasta"
# mark output files to be write-protected after creation
output: mapped = protected("{sample}.mapped.sai")
# Optionally define messages that are displayed instead of generic rule
# description on execution of the rule:
message: "Mapping reads to {input.hg19}"
# Let snakemake know how many threads you allow for this rule and propagate the
# value in the variable "threads" to the shell command below
threads: 8
shell:
# directly provide shell commands (in a multi or single line string) if
# python syntax is not needed. again, global and local variables can be
# accessed via the format minilanguage.
# Further, number of threads used by the rule can be specified. The
# snakemake scheduler ensures that the rule is run with the specified
# number of threads if enough cores are made available via the -j command
# line option.
"""
bwa aln -t {threads} {input.hg19} {input.reads} > {output.mapped}
some --other --command
"""Given a "Snakefile" with such a syntax, the workflow can be executed (e.g. using up to 6 parallel processes) by issueing in the same directory: $ snakemake -j6 For more details please see the Tutorial |