|
Usage
Examples of common usage.
Featured
intersectBedNote: When intersecting SNPs, make sure the coordinate conform to the UCSC format. That is, the start position for each SNP should be SNP position - 1 and the end position should be SNP position. E.g. chr7 10000001 10000002 rs123464
$ intersectBed -a reads.bed -b genes.bed
$ intersectBed -a reads.bed -b genes.bed -u
$ intersectBed -a reads.bed -b genes.bed -v
$ intersectBed -a reads.bed -b genes.bed -c
$ intersectBed -a reads.bed -b genes.bed -wa
$ intersectBed -a reads.bed -b genes.bed -wb
$ intersectBed -a reads.bed -b genes.bed –wa -wb
$ intersectBed -a exons.bed -b repeatMasker.bed –f 0.50
$ intersectBed -a SV.bed -b segmentalDups.bed –f 0.50 -r
$ intersectBed -a genes.bed -b LINES.bed | intersectBed -a stdin -b SINEs.bed -v
$ intersectBed -abam reads.bam -b exons.bed > reads.touchingExons.bam
$ intersectBed -abam reads.bam -b SSRs.bed -v > reads.noSSRs.bam
pairToBed
$ pairToBed -a sv.bedpe -b genes > sv.genes
$ pairToBed -a sv.bedpe -b genes -type both > sv.genes
$ pairToBed -abam reads.bam -b SSRs.bed -type neither > reads.noSSRs.bam
$ pairToBed -abam reads.bam -b segdups.bed -type both > reads.SSRs.bam
$ pairToBed -abam reads.bam -b segdups.bed -type notboth > reads.notbothSSRs.bam
pairToPair
$ pairToPair -a 1.sv.bedpe -b 2.sv.bedpe | cut -f 1-10 > 1.sv.in2.bedpe
pairToPair -a 1.sv.bedpe -b 2.sv.bedpe -type neither | cut -f 1-10 > 1.sv.notin2.bedpe
bamToBed
$ bamToBed -i reads.bam > reads.bed
$ bamToBed -i reads.bam -ed > reads.bed
$ bamToBed -i reads.bam -bedpe > reads.bedpe
windowBed
$ windowBed -a CNVs.bed -b genes.bed -w 10000
$ windowBed -a CNVs.bed -b genes.bed –l 10000 –r 5000
$ windowBed -a genes.bed –b snps.bed –l 5000 –r 1000 -sw
closestBedNote: By default, if there is a tie for closest, all ties will be reported. closestBed allows overlapping features to be the closest.
$ closestBed -a genes.bed -b ALUs.bed
$ closestBed -a genes.bed -b ALUs.bed –t first
$ closestBed -a genes.bed -b ALUs.bed –t last
subtractBedNote: If a feature in A is entirely "spanned" by any feature in B, it will not be reported.
$ subtractBed -a genes.bed -b introns.bed
mergeBed
$ mergeBed -i repeatMasker.bed
$ mergeBed -i repeatMasker.bed -n
$ mergeBed -i repeatMasker.bed -d 1000
coverageBed
$ coverageBed -a reads.bed -b windows10kb.bed | head
$ coverageBed -a reads.bed -b windows10kb.bed | cut –f 1-4 > windows10kb.cov.bedg
$ coverageBed -a reads.bed -b windows10kb.bed | awk ‘{OFS=”\t”; print $1,$2,$3,$6}’ > windows10kb.pctcov.bedg
complementBed
$ complementBed -i repeatMasker.bed -g hg18.genome
shuffleBed
$ shuffleBed -i variants.bed -g hg18.genome -excl genome_gaps.bed
$ shuffleBed -i variants.bed -g hg18.genome -excl genome_gaps.bed -chrom groupByNote groupBy assumes the input data are sorted by the grouping columns you will use. This is typically true for output from all BEDTools. Let's imagine we have three incredibly intriguing genetic variants that we are studying. $ cat variants.bed chr21 9719758 9729320 variant1 chr21 9729310 9757478 variant2 chr21 9795588 9796685 variant3 We are interested in what repeats these variants overlap with in our genome, so we use intersectedBed. $ intersectBed -a variants.bed -b repeats.bed -wa -wb > variantsToRepeats.bed $ cat variantsToRepeats.bed chr21 9719758 9729320 variant1 chr21 9719768 9721892 ALR/Alpha 1004 + chr21 9719758 9729320 variant1 chr21 9721905 9725582 ALR/Alpha 1010 + chr21 9719758 9729320 variant1 chr21 9725582 9725977 L1PA3 3288 + chr21 9719758 9729320 variant1 chr21 9726021 9729309 ALR/Alpha 1051 + chr21 9729310 9757478 variant2 chr21 9729320 9729809 L1PA3 3897 - chr21 9729310 9757478 variant2 chr21 9729809 9730866 L1P1 8367 + chr21 9729310 9757478 variant2 chr21 9730866 9734026 ALR/Alpha 1036 - chr21 9729310 9757478 variant2 chr21 9734037 9757471 ALR/Alpha 1182 - chr21 9795588 9796685 variant3 chr21 9795589 9795713 (GAATG)n 308 + chr21 9795588 9796685 variant3 chr21 9795736 9795894 (GAATG)n 683 + chr21 9795588 9796685 variant3 chr21 9795911 9796007 (GAATG)n 345 + chr21 9795588 9796685 variant3 chr21 9796028 9796187 (GAATG)n 756 + chr21 9795588 9796685 variant3 chr21 9796202 9796615 (GAATG)n 891 + chr21 9795588 9796685 variant3 chr21 9796637 9796824 (GAATG)n 621 + We can see that variant1 overlaps with 3 repeats, variant2 with 4 and variant3 with 6. We can use groupBy to summarize the hits for each variant in several useful ways. First, let's find the min and max repeat score for each variant. We do this by "grouping" on the variant coordinate columns (i.e. cols. 1,2 and 3) and ask for the min and max of the repeat score column (i.e. col. 9). $ groupBy -i variantsToRepeats.bed -grp 1,2,3 -opCol 9 -op min chr21 9719758 9729320 1004 chr21 9729310 9757478 1036 chr21 9795588 9796685 308 We can also group on just the name column with similar effect. $ groupBy -i variantsToRepeats.bed -grp 4 -opCol 9 -op min variant1 1004 variant2 1036 variant3 308 What about the max score? Let's keep the coordinates and the name of the variants so that we stay in BED format. $ groupBy -i variantsToRepeats.bed -grp 1,2,3,4 -opCol 9 -op max chr21 9719758 9729320 variant1 3288 chr21 9729310 9757478 variant2 8367 chr21 9795588 9796685 variant3 891 The mean score? $ groupBy -i variantsToRepeats.bed -grp 1,2,3,4 -opCol 9 -op mean chr21 9719758 9729320 variant1 1588.25 chr21 9729310 9757478 variant2 3620.5 chr21 9795588 9796685 variant3 600.6667 The median score? You get the point... $ groupBy -i variantsToRepeats.bed -grp 1,2,3,4 -opCol 9 -op median chr21 9719758 9729320 variant1 1030.5 chr21 9729310 9757478 variant2 2539.5 chr21 9795588 9796685 variant3 652 What is the most common repeat name overlapped? $ groupBy -i variantsToRepeats.bed -grp 1,2,3,4 -opCol 8 -op mode chr21 9719758 9729320 variant1 ALR/Alpha chr21 9729310 9757478 variant2 ALR/Alpha chr21 9795588 9796685 variant3 (GAATG)n Least common? $ groupBy -i variantsToRepeats.bed -grp 1,2,3,4 -opCol 8 -op antimode chr21 9719758 9729320 variant1 L1PA3 chr21 9729310 9757478 variant2 L1P1 chr21 9795588 9796685 variant3 (GAATG)n Now for something different. What if we wanted all of the names of the repeats listed on the same line as the variants? Use the collapse option. This "denormalizes" things. $ groupBy -i variantsToRepeats.bed -grp 1,2,3,4 -opCol 8 -op collapse chr21 9719758 9729320 variant1 ALR/Alpha,ALR/Alpha,L1PA3,ALR/Alpha, chr21 9729310 9757478 variant2 L1PA3,L1P1,ALR/Alpha,ALR/Alpha, chr21 9795588 9796685 variant3 (GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n, |
Thanks for the software!
Instead of BEDPE format for paired end data, I'd suggest you use BED12 format and make the two ends "exons". That way if you upload the file to UCSC, you'll see the reads joined with a line between them.
Hi Madelaine,
In a future release, I plan to provide a script to convert BEDPE to BED12.I appreciate your comments. Aaron
Does shuffleBed permit overlaps in the randomized file? thanks Max
Yep, shuffleBed permits overlaps in the randomized file.
Is anyone found that "bamToBed -i <infile.bam> -ed" cann't produce any result. So I check the source code of bamToBed.cpp and find in "PrintBed?" subroutine, line 325 is 'else if (useEditDistance == true && bamTag != "") {', I think there are some mistake in this line, and the right one look like this ' else if (useEditDistance == true && bamTag == ""
Hi lry198010,
AaronHi Aaron, I tried to install BedTools? in order to convert my BAM files to BED files. I followed the instructions for installation from the manual. The only thing I didn't do was the sudo since I'm using Cygwin (I'm a complete beginner in UNIX btw. so I'm not even sure what it means, sorry).
When I try to rund BamToBed?, I get the following message:
Also: $ ls bin gives me the following:
{{{ closestBed.exe groupBy.exe mergeBed.exe shuffleBed.exe subtractBed.exe bed12ToBed6.exe complementBed.exe linksBed.exe overlap.exe slopBed.exe unionBedGraphs.exe bedToIgv.exe fastaFromBed.exe maskFastaFromBed.exe pairToPair.exe sortBed.exe}}}
Should there be a BamToBed?.exe there? Best Erna
It would be nice to add a bedToBam example