My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
SamStats  
sam-stats
Updated Sep 30, 2011 by earone...@gmail.com

Introduction

Tool for computing statistics from (possibly compressed) SAM or BAM files.

Details

sam-stats [options] <sam or bam file>

Options:
  -h                : this help
  -b                : base sample size (5000)
  -c                : read sample count (0)

OUTPUT:

Complete Stats:

  <STATS>           : mean, max, stdev, median, Q1 (25 percentile), Q3
  reads             : # of entries in the file
  phred             : phred scale used
  mapped reads      : number of aligned reads
  mapped bases      : total of the lengths of the aligned reads
  forward           : number of forward-aligned reads
  reverse           : number of reverse-aligned reads
  snp rate          : mismatched bases / total bases
  ins rate          : inserted bases / total bases
  del rate          : deleted bases / total bases
  pct mismatch      : percent of reads that have mismatches
  len <STATS>       : read length stats, ignored if fixed-length
  mapq <STATS>      : stats for mapping qualities
  insert <STATS>    : stats for insert sizes
  %<CHR>            : percentage of mapped bases per chromosome (use to compute coverage)

Subsampled stats:
  base qual <STATS> : stats for base qualities
  %A,%T,%C,%G       : base percentages

Example (Single-end RNA w/bowtie, hence the odd distribution & no indels or mate info):

reads   110836000
phred   33
mapped reads    110836000
mapped bases    8756752622
foward  54636749
reverse 56199251
len max 80.0000
len mean        79.0064
len stdev       5.4687
mapq mean       170.5078
mapq stdev      119.6054
mapq Q1 3.0000
mapq median     255.0000
mapq Q3 255.0000
snp rate        0.003512
pct mismatch    21.4714
base qual mean  36.9820
base qual stdev 4.6959
%A      25.8659
%C      22.7026
%G      25.8926
%T      25.4710
%N      0.0679
%chr1   6.42
%chr10  2.41
%chr11  3.58
%chr12  7.57
%chr13  1.93
%chr14  2.12
%chr15  2.48
%chr16  1.81
%chr17  2.68
%chr18  1.92
%chr19  2.78
%chr2   3.75
%chr3   3.99
%chr4   26.81
%chr5   8.29
%chr6   3.70
%chr7   3.85
%chr8   2.67
%chr9   3.77
%chrM   6.25
%chrX   1.21
%chrY   0.01
Comment by yvetteya...@gmail.com, Jan 5, 2012

A C++ version has been written, it's faster and it keeps track of duplicate/ambigious alignment output from bowtie/bwasw or similar tools that can output multiple rows per read.

Comment by project member earone...@gmail.com, Jan 27, 2012

The C++ version 1.1 finally seems to track ambiguous alignments well. It's now an option "-D", useful for bowtie/tophat/rna-seq. Also, the signatures can be useful.


Sign in to add a comment
Powered by Google Project Hosting