My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads
Wiki pages
Links

Citation

Please cite the following article if you use BEDTools in your research:

  • Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842.

Also note that pybedtools, the Python extension of BEDTools, has been published in Bioinformatics. It extends upon much of the functionality in BEDTools and provides a very powerful and flexible Python interface for manipulating and comparing genomic features in BED/VCF/GFF/GTF/SAM/BAM format.

  • Dale RK, Pedersen BS, and Quinlan AR. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics (2011). doi:10.1093/bioinformatics/btr539

Latest news (Version 2.16.2, 30-March-2012)

  1. The -split command in intersectBed now applies to both the A and the B file.
  2. New tools: bedtools map, expand, random, makewindows, bamtofastq
  3. added -loj (left outer join) option to bedtools intersect. Behaves like a database left join and allows all A intervals to be reported. When overlaps exist, they are reported. When they don't, NULL B entries are reported for the A feature.
  4. Remote BAM files can now be read via HTTP or FTP. Support for remote BED/GFF/VCF will come in a future release.
  5. Added options to "bedtools closest" that ignore up- or down-stream intervals.
  6. Added count_distinct operation to groupby
  7. Added an initial version of a regression testing suite. invoke with "make test"
  8. When using -bed with BAM input in intersectBed, full BED12 features are produced
  9. Developed a standardized API for working with split BED and BAM intervals
  10. Support for "empty" columns: input files with consecutive delimiters are now supported
  11. Github commit number is now included in version number for easier tracking.
  12. Groupby now supports - for stdin
  13. New command line interface invoked with "bedtools command"

BEDTools Summary

The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM. Using BEDTools, one can develop sophisticated pipelines that answer complicated research questions by "streaming" several BEDTools together. The following are examples of common questions that one can address with BEDTools.

  1. Intersecting two BED files in search of overlapping features.
  2. Culling/refining/computing coverage for BAM alignments based on genome features.
  3. Merging overlapping features.
  4. Screening for paired-end (PE) overlaps between PE sequences and existing genomic features.
  5. Calculating the depth and breadth of sequence coverage across defined "windows" in a genome.
  6. Screening for overlaps between "split" alignments and genomic features.

The fact that all of the BEDTools accept input from “standard input (stdin)” allows one to “stream / pipe” several commands together to facilitate more complicated analyses. Also, the tools allow fine control over how output is reported. Most recently, I have added support for sequence alignments in BAM (http://samtools.sourceforge.net/) format, as well as for features in VCF and GFF, as well as “blocked” BED format. The tools are quite fast and typically finish in a matter of a few seconds, even for large datasets.

User base

Brief example

As stated, much of the power in BEDTools comes from the ability to pipe multiple BEDTools together with UNIX commands. The following example will hopefully illustrate this strength.

Example: Imagine you have a BED file of SNP calls that were generated from some fancy new variant detection method. You are now doing an initial screen of the results. The SNP calls are genome-wide and of varied support and biological interest. The BED file of SNP calls might look like this, where the name field is the observed alleles and the score is the depth:

$ head snps.bed
chr1	100	101	A/G 100
chr1	200	102	C/G 1000
...
chrX	300	301	C/T 500

Let's say you want to quickly find all transitions that are in exons. Using BEDTools and egrep, the command would be:

$ egrep "A/G|C/T" snps.bed | intersectBed -a stdin -b exons.bed > snpsInExons.bed

Great, but now you want to get to the interesting bits for your big paper, so you want to screen for novel variants by excluding SNP calls that are already in dbSnp. In this case, the "-v" option reports only those SNPs passed to intersectBed that are NOT in dbSnp.

$ egrep "A/G|C/T" snps.bed | \
  intersectBed -a stdin -b exons.bed | \
  intersectBed -v -a stdin -b dbSnp130.bed > novelSnpsInExons.bed

But now you subsequently detect an artifact where false positives are enriched in SNPs having coverage > 100. You refine my original query accordingly.

$ awk '$5 < 100' snps.bed | \
  egrep "A/G|C/T" | \
  intersectBed -a stdin -b exons.bed | \
  intersectBed -v -a stdin -b dbSnp130.bed \
  > bonafideNovelSnpsInExons.bed

Table of supported utilities

(BAM) denotes tools that support BAM alignment files.

Utility Description
intersectBed (BAM) Returns overlaps between two BED/GFF/VCF files.
pairToBed (BAM) Returns overlaps between a paired-end BED file and a regular BED/VCF/GFF file.
bamToBed (BAM) Converts BAM alignments to BED6, BED12, or BEDPE format.
bedToBam (BAM) Converts BED/GFF/VCF features to BAM format.
bed12ToBed6 Converts "blocked" BED12 features to discrete BED6 features.
bedToIgv Creates IGV batch scripts for taking multiple snapshots from BED/GFF/VCF features.
coverageBed (BAM) Summarizes the depth and breadth of coverage of features in one BED versus features (e.g, "windows", exons, etc.) defined in another BED/GFF/VCF file.
multiBamCov (BAM) Counts sequence coverage for multiple position-sorted bams at specific loci defined in a BED/GFF/VCF file
tagBam (BAM) Annotates a BAM file with custom tag fields based on overlaps with BED/GFF/VCF files
nuclBed Profiles the nucleotide content of intervals in a fasta file
genomeCoverageBed (BAM) Creates either a histogram, BEDGRAPH, or a "per base" report of genome coverage.
unionBedGraphs Combines multiple BedGraph files into a single file, allowing coverage/other comparisons between them.
annotateBed Annotates one BED/VCF/GFF file with overlaps from many others.
groupBy Deprecated. Now in the filo package.
overlap Returns the number of bases pairs of overlap b/w two features on the same line.
pairToPair Returns overlaps between two paired-end BED files.
closestBed Returns the closest feature to each entry in a BED/GFF/VCF file.
subtractBed Removes the portion of an interval that is overlapped by another feature.
windowBed (BAM) Returns overlaps between two BED/VCF/GFF files based on a user-defined window.
mergeBed Merges overlapping features into a single feature.
complementBed Returns all intervals not spanned by the features in a BED/GFF/VCF file.
fastaFromBed Creates FASTA sequences based on intervals in a BED/GFF/VCF file.
maskFastaFromBed Masks a FASTA file based on BED coordinates.
shuffleBed Randomly permutes the locations of a BED file among a genome.
slopBed Adjusts each BED entry by a requested number of base pairs.
flankBed Creates flanking intervals for each feature in a BED/GFF/VCF file.
sortBed Sorts a BED file by chrom, then start position. Other ways as well.
linksBed Creates an HTML file of links to the UCSC or a custom browser.


Documentation

Please read the BEDTools manual as well as the Usage and Advanced Usage pages. If you still have questions or issues, please use the BEDTools discussion list..

Notes regarding usage

  1. All BEDTools load the "B" file into memory and process the "A" file one-by-one against the features in "B". Therefore when possible, one should make set the smaller of the two files to be the "B" file. For example, you'll discover that finding overlaps between a list of 30,000 genes and 100 million aligned sequences will work much more efficiently with the genes file set as BED file "B".
  2. Most of the BEDTools have optional parameters that confer fine control over reporting and the subtleties of each tool. We suggest you look through them and if something you find necessary is missing, please let us know.
  3. Most of the BEDTools allow the "A" file to be passed via standard input for use in UNIX "streams" or "pipelines". In order to do this, use "-a stdin". For example:
  4. $ cat reads.bed | intersectBed -a stdin -b genes.bed > readsToGenes.bed

Installation

GCC version 4.1 or greater is recommended. 3.x versions will typically not compile BEDTools.

tar -zxvf BEDTools.<version>.tar.gz 
cd BEDTools<version>
make clean
make all
ls bin

# copy the binaries to a directory in your PATH. e.g., 
sudo cp bin/* /usr/local/bin
# or
cp bin/* ~/bin

Source Repository

The BEDTools source code repository is now hosted here on GitHub. The Google Code site will host all formal releases, documentation, and announcements, but the working code base will be hosted on GitHub.

Package Managers

  • Fedora/Centos. Adam Huffman has created a Red Hat package for BEDTools so that one can easily install the latest release using "yum", the Fedora package manager. It should work with Fedora 13, 14 and EPEL5/6 (for Centos, Scientific Linux, etc.).
  • yum install BEDTools
  • Debian/Ubuntu. Charles Plessy also maintains a Debian package for BEDtools that is likely to be found in its derivatives like Ubuntu. Many thanks to Charles for doing this.
  • apt-get install bedtools
  • Homebrew. Carlos Borroto has made BEDTools available on the HomeBrew package manager for OSX.
  • brew install bedtools

Contact

BEDTools was developed and is maintained by Aaron Quinlan at The University of Virginia (http://cphg.virginia.edu/quinlan). Questions should be posted to the BEDTools discussion list. Alternatively, contact Aaron via email (firstlast at gmail.com).

Powered by Google Project Hosting