My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads
Wiki pages
Links

Version 0.3.9 can map 128x128 paired-end Illumina reads. The updated version with OpenMP are released on Nov 28. The source released in Nov 28 is a false one. The correct source is released. Please download it again. Version 0.3.6 fix defect that has error in reading nucleotide 'n' in reference. Version 0.3.5 make 3 enhancements (Issue 42 and 43 and 44) including filtering reads with given number of 'N', output mapped read in fastq format, and enlarge the maximum number of alignments per read.

PerM

PerM is a software package which was designed to perform highly efficient genome scale alignments for hundreds of millions of short reads produced by the ABI SOLiD and Illumina sequencing platforms. Today PerM is capable of providing full sensitivity for alignments within 4 mismatches for 50bp SOLID reads and 9 mismatches for 100bp Illumina reads.

Usage

The reference sequence(s) can be whole genomes with multiple chromosomes, the transcriptome or even the millions reads in the fasta format, separated by '>'. The reads can be in the fasta, fastq, csfasta + QUAL formats or fastq for SOLiD reads. PerM can output alignments in our mapping format or the SAM format and that output can be further processed by ComB, SAMtools, RseqFlow pipeline and the Galaxy's *test* server. Check the manual for more detail.

Algorithm and Performance

With its special periodic spaced seeds, PerM can be fully sensitive to four mismatches, and highly sensitive to higher numbers of mismatches. This seed matching method has speed advantages in longer read (although limited to 64bp currently), non-mappable reads (for fixed number of shift and checking) and in the genome scale mapping due to the high seed weight. PerM is about 37 million reads per CPU hour, full sensitive to 3 mismatches and highly sensitive to more than 3 mismatches for 50bp SOLiD reads. PerM can build the reference index in parallel; it takes half hour to build the human genome index with 16 CPUs and 14 GB memory.

SNP Calling

Use our SNP calling tool ComB, which is much more accurate than SAMtools for SOLiD reads.

Splice Junctions Detection

You can also use PerM to detect known splice junctions. Check here for details. To detect novel alternative splice junctions, try our new tool clippers, which targets long, novel and non-cardinal splice junctions or deletions, with 100bp or longer Illumina reads.

System Requirements

PerM uses 4.5 bytes memory per base to index the reference genome. The memory usage does not dependent on the number of reads. Thus PerM requires 2GB to map reads to the human transcriptom (400 M bases)and 14 GB of memory to map reads to the human genome (3 G bases). Multiple read sets can be mapped simultaneously with multiple CPUs (Cores)using OpenMP to look up the shared memory index. Users can use iPerM, our wrapper, on a smaller memory computer or use qPerM to map one read set in parallel.

Citation

Please cite our publication in the Bioinformatics journal, Chen Y, Souaiaia T, Chen T. PerM: Efficient mapping of short sequencing reads with periodic full sensitive spaced seeds. Bioinformatics, 2009, 25 (19): 2514-2521.

Development Team

This tool was developed by Ting Chen's group, Center of Excellence in Genomic Sciences at the University of Southern California. Please email Yangho Chen (yanghoch at usc.edu,), so I can put you in the PerM mailing list for any new updates. All suggestions are welcomed.

Powered by Google Project Hosting