My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
Usage  
Examples of common usage.
Featured
Updated Apr 22, 2011 by aaronqui...@gmail.com




intersectBed

Note: When intersecting SNPs, make sure the coordinate conform to the UCSC format. That is, the start position for each SNP should be SNP position - 1 and the end position should be SNP position. E.g. chr7 10000001 10000002 rs123464

  • Report the base-pair overlap between sequence alignments and genes.
$ intersectBed -a reads.bed -b genes.bed
  • Report whether each alignment overlaps one or more genes. If not, the alignment is not reported.
$ intersectBed -a reads.bed -b genes.bed -u
  • Report those alignments that overlap NO genes. Like "grep -v"
$ intersectBed -a reads.bed -b genes.bed -v
  • Report the number of genes that each alignment overlaps.
$ intersectBed -a reads.bed -b genes.bed -c
  • Report the entire, original alignment entry for each overlap with a gene.
$ intersectBed -a reads.bed -b genes.bed -wa
  • Report the entire, original gene entry for each overlap with a gene.
$ intersectBed -a reads.bed -b genes.bed -wb
  • Report the entire, original alignment and gene entries for each overlap.
$ intersectBed -a reads.bed -b genes.bed –wa -wb
  • Only report an overlap with a repeat if it spans at least 50% of the exon.
$ intersectBed -a exons.bed -b repeatMasker.bed –f 0.50
  • Only report an overlap if comprises 50% of the structural variant and 50% of the segmental duplication. Thus, it is reciprocally at least a 50% overlap.
$ intersectBed -a SV.bed -b segmentalDups.bed –f 0.50 -r
  • Read BED A from stdin. For example, find genes that overlap LINEs but not SINEs.
$ intersectBed -a genes.bed -b LINES.bed | intersectBed -a stdin -b SINEs.bed -v
  • Retain only single-end BAM alignments that overlap exons.
$ intersectBed -abam reads.bam -b exons.bed > reads.touchingExons.bam
  • Retain only single-end BAM alignments that do not overlap simple sequence repeats.
$ intersectBed -abam reads.bam -b SSRs.bed -v > reads.noSSRs.bam
  • Retain only paired-end BAM alignments where neither end overlaps simple sequence repeats.



pairToBed

  • Return all structural variants (in BEDPE format) that overlap with genes on either end.
$ pairToBed -a sv.bedpe -b genes > sv.genes
  • Return all structural variants (in BEDPE format) that overlap with genes on both end.
$ pairToBed -a sv.bedpe -b genes -type both > sv.genes
  • Retain only paired-end BAM alignments where neither end overlaps simple sequence repeats.
$ pairToBed -abam reads.bam -b SSRs.bed -type neither > reads.noSSRs.bam
  • Retain only paired-end BAM alignments where both ends overlap segmental duplications.
$ pairToBed -abam reads.bam -b segdups.bed -type both > reads.SSRs.bam
  • Retain only paired-end BAM alignments where neither or one and only one end overlaps segmental duplications.
$ pairToBed -abam reads.bam -b segdups.bed -type notboth > reads.notbothSSRs.bam



pairToPair

  • Find all SVs (in BEDPE format) in sample 1 that are also in sample 2.
$ pairToPair -a 1.sv.bedpe -b 2.sv.bedpe | cut -f 1-10 > 1.sv.in2.bedpe
  • Find all SVs (in BEDPE format) in sample 1 that are not in sample 2.
pairToPair -a 1.sv.bedpe -b 2.sv.bedpe -type neither | cut -f 1-10 > 1.sv.notin2.bedpe



bamToBed

  • Convert BAM alignments to BED format.
$ bamToBed -i reads.bam > reads.bed
  • Convert BAM alignments to BED format using edit distance (NM) as the BED “score”.
$ bamToBed -i reads.bam -ed > reads.bed
  • Convert BAM alignments to BEDPE format.
$ bamToBed -i reads.bam -bedpe > reads.bedpe



windowBed

  • Report all genes that are within 10000 bp upstream or downstream of CNVs.
$ windowBed -a CNVs.bed -b genes.bed -w 10000
  • Report all genes that are within 10000 bp upstream or 5000 bp downstream of CNVs.
$ windowBed -a CNVs.bed -b genes.bed –l 10000 –r 5000
  • Report all SNPs that are within 5000 bp upstream or 1000 bp downstream of genes. Define upstream and downstream based on strand.
$ windowBed -a genes.bed –b snps.bed –l 5000 –r 1000 -sw



closestBed

Note: By default, if there is a tie for closest, all ties will be reported. closestBed allows overlapping features to be the closest.

  • Find the closest ALU to each gene.
$ closestBed -a genes.bed -b ALUs.bed

  • Find the closest ALU to each gene, choosing the first ALU in the file if there is a tie.
$ closestBed -a genes.bed -b ALUs.bed –t first
  • Find the closest ALU to each gene, choosing the last ALU in the file if there is a tie.
$ closestBed -a genes.bed -b ALUs.bed –t last



subtractBed

Note: If a feature in A is entirely "spanned" by any feature in B, it will not be reported.

  • Remove introns from gene features. Exons will (should) be reported.
$ subtractBed -a genes.bed -b introns.bed



mergeBed

  • Merge overlapping repetitive elements into a single entry.
$ mergeBed -i repeatMasker.bed
  • Merge overlapping repetitive elements into a single entry, returning the number of entries merged.
$ mergeBed -i repeatMasker.bed -n

  • Merge nearby (within 1000 bp) repetitive elements into a single entry.
$ mergeBed -i repeatMasker.bed -d 1000



coverageBed

  • Compute the coverage of aligned sequences on 10 kilobase “windows” spanning the genome.
$ coverageBed -a reads.bed -b windows10kb.bed | head

  • Compute the coverage of aligned sequences on 10 kilobase “windows” spanning the genome and created a BEDGRAPH of the number of aligned reads in each window for display on the UCSC browser.
$ coverageBed -a reads.bed -b windows10kb.bed | cut –f 1-4 > windows10kb.cov.bedg
  • Compute the coverage of aligned sequences on 10 kilobase “windows” spanning the genome and created a BEDGRAPH of the fraction of each window covered by at least one aligned read for display on the UCSC browser.
$ coverageBed -a reads.bed -b windows10kb.bed | awk ‘{OFS=”\t”; print $1,$2,$3,$6}’ > windows10kb.pctcov.bedg


complementBed

  • Report all intervals in the human genome that are not covered by repetitive elements.
$ complementBed -i repeatMasker.bed -g hg18.genome



shuffleBed

  • Randomly place all discovered variants in the genome. However, prevent them from being placed in know genome gaps.
$ shuffleBed -i variants.bed -g hg18.genome -excl genome_gaps.bed
  • Randomly place all discovered variants in the genome. However, prevent them from being placed in know genome gaps and require that the variants be randomly placed on the same chromosome.
$ shuffleBed -i variants.bed -g hg18.genome -excl genome_gaps.bed -chrom

groupBy

Note groupBy assumes the input data are sorted by the grouping columns you will use. This is typically true for output from all BEDTools.

Let's imagine we have three incredibly intriguing genetic variants that we are studying.

$ cat variants.bed
chr21	9719758	9729320	variant1
chr21	9729310	9757478	variant2
chr21   9795588 9796685 variant3

We are interested in what repeats these variants overlap with in our genome, so we use intersectedBed.

$ intersectBed -a variants.bed -b repeats.bed -wa -wb > variantsToRepeats.bed
$ cat variantsToRepeats.bed
chr21	9719758	9729320	variant1	chr21	9719768	9721892	ALR/Alpha	1004	+
chr21	9719758	9729320	variant1	chr21	9721905	9725582	ALR/Alpha	1010	+
chr21	9719758	9729320	variant1	chr21	9725582	9725977	L1PA3	3288	+
chr21	9719758	9729320	variant1	chr21	9726021	9729309	ALR/Alpha	1051	+
chr21	9729310	9757478	variant2	chr21	9729320	9729809	L1PA3	3897	-
chr21	9729310	9757478	variant2	chr21	9729809	9730866	L1P1	8367	+
chr21	9729310	9757478	variant2	chr21	9730866	9734026	ALR/Alpha	1036	-
chr21	9729310	9757478	variant2	chr21	9734037	9757471	ALR/Alpha	1182	-
chr21	9795588	9796685	variant3	chr21	9795589	9795713	(GAATG)n	308	+
chr21	9795588	9796685	variant3	chr21	9795736	9795894	(GAATG)n	683	+
chr21	9795588	9796685	variant3	chr21	9795911	9796007	(GAATG)n	345	+
chr21	9795588	9796685	variant3	chr21	9796028	9796187	(GAATG)n	756	+
chr21	9795588	9796685	variant3	chr21	9796202	9796615	(GAATG)n	891	+
chr21	9795588	9796685	variant3	chr21	9796637	9796824	(GAATG)n	621	+

We can see that variant1 overlaps with 3 repeats, variant2 with 4 and variant3 with 6. We can use groupBy to summarize the hits for each variant in several useful ways.

First, let's find the min and max repeat score for each variant. We do this by "grouping" on the variant coordinate columns (i.e. cols. 1,2 and 3) and ask for the min and max of the repeat score column (i.e. col. 9).

$ groupBy -i variantsToRepeats.bed -grp 1,2,3 -opCol 9 -op min
chr21	9719758	9729320	1004
chr21	9729310	9757478	1036
chr21	9795588	9796685	308

We can also group on just the name column with similar effect.

$ groupBy -i variantsToRepeats.bed -grp 4 -opCol 9 -op min
variant1	1004
variant2	1036
variant3	308

What about the max score? Let's keep the coordinates and the name of the variants so that we stay in BED format.

$ groupBy -i variantsToRepeats.bed -grp 1,2,3,4 -opCol 9 -op max
chr21	9719758	9729320	variant1	3288
chr21	9729310	9757478	variant2	8367
chr21	9795588	9796685	variant3	891

The mean score?

$ groupBy -i variantsToRepeats.bed -grp 1,2,3,4 -opCol 9 -op mean
chr21	9719758	9729320	variant1	1588.25
chr21	9729310	9757478	variant2	3620.5
chr21	9795588	9796685	variant3	600.6667

The median score? You get the point...

$ groupBy -i variantsToRepeats.bed -grp 1,2,3,4 -opCol 9 -op median
chr21	9719758	9729320	variant1	1030.5
chr21	9729310	9757478	variant2	2539.5
chr21	9795588	9796685	variant3	652

What is the most common repeat name overlapped?

$ groupBy -i variantsToRepeats.bed -grp 1,2,3,4 -opCol 8 -op mode
chr21	9719758	9729320	variant1	ALR/Alpha
chr21	9729310	9757478	variant2	ALR/Alpha
chr21	9795588	9796685	variant3	(GAATG)n

Least common?

$ groupBy -i variantsToRepeats.bed -grp 1,2,3,4 -opCol 8 -op antimode
chr21	9719758	9729320	variant1	L1PA3
chr21	9729310	9757478	variant2	L1P1
chr21	9795588	9796685	variant3	(GAATG)n

Now for something different. What if we wanted all of the names of the repeats listed on the same line as the variants? Use the collapse option. This "denormalizes" things.

$ groupBy -i variantsToRepeats.bed -grp 1,2,3,4 -opCol 8 -op collapse
chr21	9719758	9729320	variant1	ALR/Alpha,ALR/Alpha,L1PA3,ALR/Alpha,
chr21	9729310	9757478	variant2	L1PA3,L1P1,ALR/Alpha,ALR/Alpha,
chr21	9795588	9796685	variant3	(GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,
Comment by madela...@gmail.com, Feb 3, 2010

Thanks for the software!

Instead of BEDPE format for paired end data, I'd suggest you use BED12 format and make the two ends "exons". That way if you upload the file to UCSC, you'll see the reads joined with a line between them.

Comment by project member aaronqui...@gmail.com, Feb 19, 2010

Hi Madelaine,

We considered this. Unfortunately, BED12 does not support trans-chromosomal features. This was the main reason we defined BEDPE. Not supporting trans-chromosomal events prevents looking at fusion genes or inter-chromosomal structural variants (e.g. translocations or novel mobile element insertions).
In a future release, I plan to provide a script to convert BEDPE to BED12.

I appreciate your comments. Aaron

Comment by maximili...@gmail.com, Jun 25, 2010

Does shuffleBed permit overlaps in the randomized file? thanks Max

Comment by project member aaronqui...@gmail.com, Jul 5, 2010

Yep, shuffleBed permits overlaps in the randomized file.

Comment by lry198...@gmail.com, Sep 3, 2010

Is anyone found that "bamToBed -i <infile.bam> -ed" cann't produce any result. So I check the source code of bamToBed.cpp and find in "PrintBed?" subroutine, line 325 is 'else if (useEditDistance == true && bamTag != "") {', I think there are some mistake in this line, and the right one look like this ' else if (useEditDistance == true && bamTag == ""

Comment by project member aaronqui...@gmail.com, Sep 6, 2010

Hi lry198010,

You are correct; I made a typo in the latest release. Instead of "-ed", one can use "-tag NM" in the interim. I will fix the Hydra workflow as well. Thanks for bringing this up.
Aaron

Comment by ernamagn...@gmail.com, Oct 13, 2010

Hi Aaron, I tried to install BedTools? in order to convert my BAM files to BED files. I followed the instructions for installation from the manual. The only thing I didn't do was the sudo since I'm using Cygwin (I'm a complete beginner in UNIX btw. so I'm not even sure what it means, sorry).

When I try to rund BamToBed?, I get the following message:

bash: BamToBed: command not found

Also: $ ls bin gives me the following:

{{{ closestBed.exe groupBy.exe mergeBed.exe shuffleBed.exe subtractBed.exe bed12ToBed6.exe complementBed.exe linksBed.exe overlap.exe slopBed.exe unionBedGraphs.exe bedToIgv.exe fastaFromBed.exe maskFastaFromBed.exe pairToPair.exe sortBed.exe}}}

Should there be a BamToBed?.exe there? Best Erna

Comment by kristi.t...@gmail.com, Jan 26, 2011

It would be nice to add a bedToBam example


Sign in to add a comment
Powered by Google Project Hosting