bamseek


Browse large BAM and SAM alignment files.

New Update (July 24, 2011): Added creating of external index to allow quick loading of large files that have already been indexed.

Update (July 10, 2011): Added support for Standard Flowgram Format (SFF) files used by 454 and Ion Torrent. Shows read name, sequence, and quality (Phred+33). Available for testing from the Downloads page.

Update (June 14, 2011): Added support for FASTQ files. Also, it detects if the FASTQ file was generated by CASAVA 1.8 and expands the header line into machine name, lane, xy coords, etc.

Update (June 06, 2011): Added support for VCF files. The VCF file can be an uncompressed text file or a BGZF-compressed binary file, which is how the data is stored in the 1000 Genomes Project.

Update (May 31, 2011): Good news for those who have been waiting for a Windows version of BAMseek. I have been working on a cross platform BAM/SAM file viewer that should be available sometime this week. BAMseek has been rewritten in Java to make it easier to port to Windows, Mac or Linux.

BAMseek

A Large File Viewer for BAM and SAM alignment files.

http://bamseek.googlecode.com/files/BAMseekCommercial.png

What is it?

BAM and SAM alignment files are usually large and cannot be easily opened in common text viewers. BAM files are not human-readable so need a specialized viewer to interpret the information contained within the files. BAMseek allows you to open and browse SAM and BAM alignment files, no matter how large the files may be. BAMseek does not require command line knowledge. Instead, BAMseek provides a comfortable point-and-click interface for browsing BAM and SAM files.

How do I install it?

Download the BAMseek jar file (with extension ".jar") here. Open BAMseek by double-clicking on the jar file, and seek away! If you have problems opening BAMseek, you may have to update to the latest Java Runtime Environment JRE

How do I use it?

After opening BAMseek, go to File > Open File ... Browse to the location of a SAM (.sam) or BAM (.bam) file and select it. If the file is large, it may take a little while for the file to be processed and opened. For the impatient, you can cancel a file load that is in progress, and you will be able to view the file up until the point you canceled the progress. Don't worry, the BAM and SAM files are never modified within BAMseek.

The advantage of BAMseek is that you can browse the entire file and only use a small amount of memory, even for large files. The file is divided into pages, with each page having a length of 1000 lines. You can browse within a page by using the scroll bars or mouse wheel. You can jump between pages using the slider and scroll box at the bottom of BAMseek. The text box at top displays the header information (if present), which usually consists of sequence names and lengths.

What other cool things can I do with BAMseek?

Here are some of the features currently in BAMseek * Are you confused by the Cigar, Quality Score, or Tag columns? Hover over cells to get more information about the item in the cell. Currently, you can 1. hover over Flag (Column 2) to translate the meaning of the flag description. 1. hover over Cigar (Column 6) to understand how the read was aligned to the sequence. 1. hover over Sequence (Column 10) to see which bases are high quality and which are low quality. A base is grayed out if the Phred quality is below Q20 (more than a 1 in a 100 chance of being an incorrect base call). 1. hover over Tags (Columns 12+) to display the tag description given in the SAM/BAM format specs.

Really? BAMseek can view files larger than the memory capacity? What kind of black magic is this!?

BAMseek groups the BAM or SAM file into "pages" (1000 SAM/BAM records per page), and only loads the currently viewed page into memory - similar to how you view a large book one page at a time. The disk address of each page start is computed when you open the file, allowing you to quickly jump to any page you want.

So who are you?

My name is Justin, and I am a software developer working for the past 3 years on analyzing Next-Generation Sequencing data. I wanted to do something to give back to the community, so I spent some nights and weekends putting this together in hopes that others will find it useful. My goal is to allow those who wish to view the files that are taking up the lion's share of their disk space but who may not have the experience of compiling programs and using the command line. I also had me in mind- I got tired of looking at a black-and-white console window with poorly aligned columns. Please let me know if you have feedback (both positive and negative), questions, or suggestions. I will continue to update the software so check back often. Thanks!

Project Information

The project was created on Apr 14, 2011.

  • License: GNU GPL v3
  • 15 stars
  • svn-based source control

Labels:
SAM BAM viewer alignments format largefiles VCF SFF FastQ sequencing