bsolana


An approach for the analysis of two-base encoding bisulfite sequencing data

B-SOLANA

Function

B-SOLANA provides a fast and accurate all-in-one approach including alignment and methylation calling of two-base encoding (“colorspace”) bisulfite sequencing data.

The publication of B-SOLANA in Bioinformatics can be found here

Our review about primary data analysis in BS-Seq can be found here

The development of B-SOLANA was partly funded by: The German Ministry of Education and Research (BMBF); the National Genome Research Network (NGFN); the Deutsche Forschungsgemeinschaft (DFG) Cluster of Excellence ‘Inflammation at Interfaces’; the EU Seventh Framework Programme (FP7/2007-2013, grant number 262055, ESGI).

Coding language

Python (http://www.python.org/)

Requirements

Code released

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. See http://www.gnu.org/licenses/

Initial Contact

Benjamin Kreck (send mail to: b dot kreck at ikmb dot uni-kiel dot de)

Introduction

B-SOLANA performs sequence alignment and methylation calling for two-base encoding (colorspace) bisulfite sequencing. It is based on the established short read aligner Bowtie (Langmead, 2009) and SAMtools, utilities for manipulating alignments (Li, 2009). B-SOLANA is divided into four individual steps: 1. Indexing 1. Mapping 1. Determination of best alignment 1. Methylation calling

There is a script, bsolana run, which is a composition of all steps mentioned above.

Installation

See the manual of B-SOLANA at Downloads for further information.

Test data

We generated a test data set of 1 000 000 sequences (read length: 50 bp, SOLiD version 4).

Download the B-SOLANA software compressed file at Downloads and uncompress in a directory of your choice. Download the fasta file of hg19 (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/). Change to the directory of B-SOLANA executable files and start the analysis:

  • [bash]$ bsolana run -ref <path of fasta reference file> -bowtie <Path of executable bowtie version> -samtools <Path of executable SAMtools version> -work ./test/results/ -csfasta ./test/sequences/test_F3.csfasta -qual ./test/sequences/test_F3.qual -thread 4 -name test

Results will be located in the subfolder .../test/results of the B-SOLANA home directory.

References

  • Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9

  • Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25

Changelog

  • 14-11-2011: Version 1.0 released

    • Fixed a bug with the indexing of the reference genome
    • Speed up the runtime of bsolana bemap
  • 26-08-2011: Version 0.1.1 released

    • Added bsolana run as a composition of all sub-programs
    • Fixed a bug with extraction methylation levels
  • 20-07-2011: Version 0.1 released

    • Initial release

Project Information

The project was created on Jul 8, 2011.

Labels:
Bioinformatics Epigenetics Next-generation-sequencing Bisulfite-sequencing Methylation Colorspace Two-base-encoding Methylome