Introduction
Tool for computing statistics from (possibly compressed) SAM or BAM files.
Details
sam-stats [options] <sam or bam file>
Options:
-h : this help
-b : base sample size (5000)
-c : read sample count (0)
OUTPUT:
Complete Stats:
<STATS> : mean, max, stdev, median, Q1 (25 percentile), Q3
reads : # of entries in the file
phred : phred scale used
mapped reads : number of aligned reads
mapped bases : total of the lengths of the aligned reads
forward : number of forward-aligned reads
reverse : number of reverse-aligned reads
snp rate : mismatched bases / total bases
ins rate : inserted bases / total bases
del rate : deleted bases / total bases
pct mismatch : percent of reads that have mismatches
len <STATS> : read length stats, ignored if fixed-length
mapq <STATS> : stats for mapping qualities
insert <STATS> : stats for insert sizes
%<CHR> : percentage of mapped bases per chromosome (use to compute coverage)
Subsampled stats:
base qual <STATS> : stats for base qualities
%A,%T,%C,%G : base percentages
Example (Single-end RNA w/bowtie, hence the odd distribution & no indels or mate info):
reads 110836000
phred 33
mapped reads 110836000
mapped bases 8756752622
foward 54636749
reverse 56199251
len max 80.0000
len mean 79.0064
len stdev 5.4687
mapq mean 170.5078
mapq stdev 119.6054
mapq Q1 3.0000
mapq median 255.0000
mapq Q3 255.0000
snp rate 0.003512
pct mismatch 21.4714
base qual mean 36.9820
base qual stdev 4.6959
%A 25.8659
%C 22.7026
%G 25.8926
%T 25.4710
%N 0.0679
%chr1 6.42
%chr10 2.41
%chr11 3.58
%chr12 7.57
%chr13 1.93
%chr14 2.12
%chr15 2.48
%chr16 1.81
%chr17 2.68
%chr18 1.92
%chr19 2.78
%chr2 3.75
%chr3 3.99
%chr4 26.81
%chr5 8.29
%chr6 3.70
%chr7 3.85
%chr8 2.67
%chr9 3.77
%chrM 6.25
%chrX 1.21
%chrY 0.01
A C++ version has been written, it's faster and it keeps track of duplicate/ambigious alignment output from bowtie/bwasw or similar tools that can output multiple rows per read.
The C++ version 1.1 finally seems to track ambiguous alignments well. It's now an option "-D", useful for bowtie/tophat/rna-seq. Also, the signatures can be useful.