|
Project Information
-
Project feeds
- Code license
-
GNU GPL v2
-
Labels
bioinformatics,
genomics,
bed,
sam,
bam,
overlap,
features,
sequencing,
intersect,
coverage,
gff,
vcf,
bedgraph,
intervals,
genomearithmetic
Featured
Links
|
CitationPlease cite the following article if you use BEDTools in your research: - Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842.
Also note that pybedtools, the Python extension of BEDTools, has been published in Bioinformatics. It extends upon much of the functionality in BEDTools and provides a very powerful and flexible Python interface for manipulating and comparing genomic features in BED/VCF/GFF/GTF/SAM/BAM format. - Dale RK, Pedersen BS, and Quinlan AR. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics (2011). doi:10.1093/bioinformatics/btr539
Latest news (Version 2.15.0, 2-Jan-2012)- New command line interface invoked with "bedtools command"
- New "bedtools cluster" tool for clustering overlapping/nearby intervals.
- Improved support for file "headers"; especially important for VCF format.
- New multiIntersectBed tool.
- New -D option in closestBed for reporting signed distances.
- intersectBed uses the "chromsweep" algorithm for position-sorted BED/GFF/VCF files. Invoked with the -sorted option.
- genomeCoverageBed no longer needs a genome file for BAM input.
- tagBam can use the BED score field to populate tags.
- New multiBamCov tool counts sequence coverage for multiple position-sorted bams at specific loci defined in a BED/GFF/VCF file
- New tabBam tool for annotating a BAM file with custom tag fields based on overlaps with BED/GFF/VCF files.
- New nucBed tool profiles the nucleotide content of intervals in a fasta file.
- New -counts option for coverageBed.
- New -S option for detecting overlaps on the opposite strand.
BEDTools SummaryThe BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM. Using BEDTools, one can develop sophisticated pipelines that answer complicated research questions by "streaming" several BEDTools together. The following are examples of common questions that one can address with BEDTools. - Intersecting two BED files in search of overlapping features.
- Culling/refining/computing coverage for BAM alignments based on genome features.
- Merging overlapping features.
- Screening for paired-end (PE) overlaps between PE sequences and existing genomic features.
- Calculating the depth and breadth of sequence coverage across defined "windows" in a genome.
- Screening for overlaps between "split" alignments and genomic features.
The fact that all of the BEDTools accept input from “standard input (stdin)” allows one to “stream / pipe” several commands together to facilitate more complicated analyses. Also, the tools allow fine control over how output is reported. Most recently, I have added support for sequence alignments in BAM (http://samtools.sourceforge.net/) format, as well as for features in VCF and GFF, as well as “blocked” BED format. The tools are quite fast and typically finish in a matter of a few seconds, even for large datasets. User base
Brief exampleAs stated, much of the power in BEDTools comes from the ability to pipe multiple BEDTools together with UNIX commands. The following example will hopefully illustrate this strength. Example: Imagine you have a BED file of SNP calls that were generated from some fancy new variant detection method. You are now doing an initial screen of the results. The SNP calls are genome-wide and of varied support and biological interest. The BED file of SNP calls might look like this, where the name field is the observed alleles and the score is the depth: $ head snps.bed
chr1 100 101 A/G 100
chr1 200 102 C/G 1000
...
chrX 300 301 C/T 500 Let's say you want to quickly find all transitions that are in exons. Using BEDTools and egrep, the command would be: $ egrep "A/G|C/T" snps.bed | intersectBed -a stdin -b exons.bed > snpsInExons.bed Great, but now you want to get to the interesting bits for your big paper, so you want to screen for novel variants by excluding SNP calls that are already in dbSnp. In this case, the "-v" option reports only those SNPs passed to intersectBed that are NOT in dbSnp. $ egrep "A/G|C/T" snps.bed | \
intersectBed -a stdin -b exons.bed | \
intersectBed -v -a stdin -b dbSnp130.bed > novelSnpsInExons.bed But now you subsequently detect an artifact where false positives are enriched in SNPs having coverage > 100. You refine my original query accordingly. $ awk '$5 < 100' snps.bed | \
egrep "A/G|C/T" | \
intersectBed -a stdin -b exons.bed | \
intersectBed -v -a stdin -b dbSnp130.bed \
> bonafideNovelSnpsInExons.bed Table of supported utilities(BAM) denotes tools that support BAM alignment files. | Utility | Description | | intersectBed (BAM) | Returns overlaps between two BED/GFF/VCF files. | | pairToBed (BAM) | Returns overlaps between a paired-end BED file and a regular BED/VCF/GFF file. | | bamToBed (BAM) | Converts BAM alignments to BED6, BED12, or BEDPE format. | | bedToBam (BAM) | Converts BED/GFF/VCF features to BAM format. | | bed12ToBed6 | Converts "blocked" BED12 features to discrete BED6 features. | | bedToIgv | Creates IGV batch scripts for taking multiple snapshots from BED/GFF/VCF features. | | coverageBed (BAM) | Summarizes the depth and breadth of coverage of features in one BED versus features (e.g, "windows", exons, etc.) defined in another BED/GFF/VCF file. | | multiBamCov (BAM) | Counts sequence coverage for multiple position-sorted bams at specific loci defined in a BED/GFF/VCF file | | tagBam (BAM) | Annotates a BAM file with custom tag fields based on overlaps with BED/GFF/VCF files | | nuclBed | Profiles the nucleotide content of intervals in a fasta file | | genomeCoverageBed (BAM) | Creates either a histogram, BEDGRAPH, or a "per base" report of genome coverage. | | unionBedGraphs | Combines multiple BedGraph files into a single file, allowing coverage/other comparisons between them. | | annotateBed | Annotates one BED/VCF/GFF file with overlaps from many others. | | groupBy | Deprecated. Now in the filo package. | | overlap | Returns the number of bases pairs of overlap b/w two features on the same line. | | pairToPair | Returns overlaps between two paired-end BED files. | | closestBed | Returns the closest feature to each entry in a BED/GFF/VCF file. | | subtractBed | Removes the portion of an interval that is overlapped by another feature. | | windowBed (BAM) | Returns overlaps between two BED/VCF/GFF files based on a user-defined window. | | mergeBed | Merges overlapping features into a single feature. | | complementBed | Returns all intervals not spanned by the features in a BED/GFF/VCF file. | | fastaFromBed | Creates FASTA sequences based on intervals in a BED/GFF/VCF file. | | maskFastaFromBed | Masks a FASTA file based on BED coordinates. | | shuffleBed | Randomly permutes the locations of a BED file among a genome. | | slopBed | Adjusts each BED entry by a requested number of base pairs. | | flankBed | Creates flanking intervals for each feature in a BED/GFF/VCF file. | | sortBed | Sorts a BED file by chrom, then start position. Other ways as well. | | linksBed | Creates an HTML file of links to the UCSC or a custom browser. |
DocumentationPlease read the BEDTools manual as well as the Usage and Advanced Usage pages. If you still have questions or issues, please use the BEDTools discussion list.. Notes regarding usage- All BEDTools load the "B" file into memory and process the "A" file one-by-one against the features in "B". Therefore when possible, one should make set the smaller of the two files to be the "B" file. For example, you'll discover that finding overlaps between a list of 30,000 genes and 100 million aligned sequences will work much more efficiently with the genes file set as BED file "B".
- Most of the BEDTools have optional parameters that confer fine control over reporting and the subtleties of each tool. We suggest you look through them and if something you find necessary is missing, please let us know.
- Most of the BEDTools allow the "A" file to be passed via standard input for use in UNIX "streams" or "pipelines". In order to do this, use "-a stdin". For example:
$ cat reads.bed | intersectBed -a stdin -b genes.bed > readsToGenes.bed
InstallationGCC version 4.1 or greater is recommended. 3.x versions will typically not compile BEDTools. tar -zxvf BEDTools.<version>.tar.gz
cd BEDTools<version>
make clean
make all
ls bin
# copy the binaries to a directory in your PATH. e.g.,
sudo cp bin/* /usr/local/bin
# or
cp bin/* ~/bin Source RepositoryThe BEDTools source code repository is now hosted here on GitHub. The Google Code site will host all formal releases, documentation, and announcements, but the working code base will be hosted on GitHub. Package Managers- Fedora/Centos. Adam Huffman has created a Red Hat package for BEDTools so that one can easily install the latest release using "yum", the Fedora package manager. It should work with Fedora 13, 14 and EPEL5/6 (for Centos, Scientific Linux, etc.).
yum install BEDTools Debian/Ubuntu. Charles Plessy also maintains a Debian package for BEDtools that is likely to be found in its derivatives like Ubuntu. Many thanks to Charles for doing this. apt-get install bedtools Homebrew. Carlos Borroto has made BEDTools available on the HomeBrew package manager for OSX. brew install bedtools ContactBEDTools was developed and is maintained by Aaron Quinlan at The University of Virginia (http://cphg.virginia.edu/quinlan). Questions should be posted to the BEDTools discussion list. Alternatively, contact Aaron via email (firstlast at gmail.com).
|