hydra-sv


Hydra detects structural variation breakpoints in both unique and duplicated genomic regions.

Latest news (Version 0.5.3, 20-Aug-2010)

  1. Vastly improved version of our multiple-mapping deduping script (dedupDiscordants.py). This greatly improves breakpoint calling specificity.
  2. New version of pairDiscordants.py that resolves several minor bugs.
  3. Fixed minor bug in Hydra that occassionally caused false positive calls from readpairs that did not actually support a common breakpoint.
  4. Updated the example workflow page to reflect changes in our pipeline.

Hydra Summary

Hydra detects structural variation (SV) breakpoints by clustering discordant paired-end alignments whose "signatures" corroborate the same putative breakpoint. Hydra can detect breakpoints caused by all classes of structural variation. Moreover, it was designed to detect variation in both unique and duplicated genomic regions; therefore, it will examine paired-end reads having multiple discordant alignments.

It is important to note that Hydra does not attempt to classify SV breakpoints based on the mapping distances and orientations of each breakpoint cluster. In other words, it merely detects and reports breakpoints, but does not decide what type of SV (e.g. deletion, inversion, etc.) is indicated by the apparent breakpoint. This is an intentional decision, as we have observed that in loci affected by complex rearrangements, the type of variant suggested by the breakpoint signature is not always correct. Hydra does report the orientations, distances, number of supporting read-pairs, etc., for each breakpoint. We suggest that downstream methods be used to classify variants based on the genomic features that they overlap and the co-occurrence of other breakpoints. For example, we developed BEDTools for exactly this purpose and the breakpoints reported by Hydra are in the BEDPE format used by BEDTools. Future releases of Hydra will include scripts that assist in the classification process.

Hydra was developed by Aaron Quinlan and Ira Hall at the University of Virginia. It is written in C++ and is under continued development. Therefore, please check back frequently, as we will continue to update the source code and documentation as Hydra and the methods leading up to it (e.g. aligners, alignment formats, sequence technologies and protocols) evolve.


Software in Hydra suite

Program Description hydra Calls SV breakpoints. bamToFastq Creates a FASTQ file from alignments in BAM file. bamToBed Available via the BEDTools package. pairToBed Compares paired-end alignments in BEDPE format to genome features in BED format. Allows one to, for example, remove mappings where both ends are in SSRs / VNTRs. Available via the BEDTools package. pairDiscordants.py Creates all pairing possibilities for discordant mappings from each each of a read-pair. dedupDiscordants.py Collapses duplicate read-pair mappings into a single, best mapping.


Installation

tar -zxvf Hydra.<version>.tar.gz 

cd Hydra
make clean # in case I forgot to clear the binaries
make all
ls bin

# copy the binaries to a directory in your PATH. e.g.,
sudo cp bin/* /usr/local/bin
sudo cp scripts/* /usr/local/bin


Documentation

  • File formats
  • Suggested workflow
  • Glossary

We will continue to update and improve the documentation over the coming weeks/months. In the interim, please refer to the Methods and Supplementary Methods sections in the following Genome Research article.


Caveats

As with all SV detection methods based on current paired-end DNA sequencing technologies, the quality/accuracy of the detected breakpoints greatly depends on the quality of the sequence alignments leading up to calling breakpoints. In many ways, it is a classic "garbage in, garbage out" problem. Therefore, we strongly suggest that users of Hydra carefully scrutinize the discordant alignments that are used as input to Hydra. Try to be certain that you have used aligners and settings that can detect essentially all concordant read-pairs, even if they have SNPs, sequencing errors or INDELs that would otherwise perturb their discovery using a naive aligner with "fast" alignment settings.


Ongoing work

  1. Direct support for BAM input files.
  2. Split-read SV detection.
  3. de novo breakpoint assembly.
  4. Breakpoint sequence annotation; mechanistic inference.


Contact

  • Aaron Quinlan, University of Virginia. firstlast (at) gmail dot com.
  • Ira Hall, University of Virginia.