Introduction
Below are common questions that are commonly asked by new BEDTools users. We have attempted to organize the content by subject.
Problems compiling BEDTools
What are all of these zlib errors?
BEDTools supports BAM and gzipped BED/GFF/VCF files through use of the zlib libraries. Some versions of Ubuntu and other UNIX flavors come without the zlib libraries installed. If you run into this, please try first installing the zlib libraries. For example, on Ubuntu, this would be:
sudo apt-get install zlib
sudo apt-get install zlib-devel
Problems with running BEDTools commands
How do I see the help for a given command?
All BEDTools commands have a "-h" option. Additionally, some tools will provide the help menu if you just type the command line followed by enter.
$ coverageBed -h
Program: coverageBed (v2.11.2)
Author: Aaron Quinlan (aaronquinlan@gmail.com)
Summary: Returns the depth and breadth of coverage of features from A
on the intervals in B.
Usage: coverageBed [OPTIONS] -a <bed/gff/vcf> -b <bed/gff/vcf>
Options:
-abam The A input file is in BAM format.
-s Force strandedness. That is, only include hits in A that
overlap B on the same strand.
- By default, hits are included without respect to strand.
...
Problems with file formats
What does zero-based, half-open mean?
BEDTools users are sometimes confused by the way BED start and end coordinates are represented. BEDTools uses the UCSC Genome Browser’s internal database convention of making the start position 0-based and the end position 1-based: (http://genome.ucsc.edu/FAQ/FAQtracks#tracks1). In other words, BEDTools interprets the “start” column as being 1 basepair higher than what is represented in the file. For example, the following BED feature represents a single base on chromosome 1; namely, the 1st base.
$ cat first_base.bed
chr1 0 1 i-am-the-first-position-on-chrom1
Why, you might ask? The advantage of storing features this way is that it prevents lots of superfluous "+1" operations in the code when computing lengths, overlap, etc. These would be necessary were the coordinate system 1-based, inclusive. Thus, storing BED features this way reduces the computational burden.
Why must my input files be tab-delimited?
While it is certainly possible to allow BED and other formats to be delimited by any form of whitespace, we have decided to require tab delimiters for three reasons:
- By requiring one delimiter type, the processing time is minimized.
- Tab-delimited files are more amenable to other UNIX utilities.
- GFF files can contain spaces within attribute columns. This complicates the use of space-delimited files as spaces must therefore be treated specially depending on the context.
Why do I get the "It looks as though you have less than 3 columns at line: 1." message?
This is typically caused by files that use a delimiter other than TAB.
Problems with output
Converting among formats
Miscellaneous
What happened to the "groupBy" tool?
In the interest of keeping the software we develop in logical collections, groupBy has been moved to the filo project.