My favorites | Sign in
Project Logo
                
Search
for
Updated Jul 09, 2008 by cjlee112
SolexaTools  
Pygr tools for working with high throughput sequencing data like Solexa.

Goals

This page is for assembling information about needs and possible solutions for working with Solexa and other high-throughput sequencing data. Please use this page to list:

Datasets

Namshin Kim: Here is the situation. I have 1G, a billion, reads of 36bp.

Analyses

Namshin Kim: I want to scan the alignments by genomic coordinates, and then I can see genomic variations in detail. I can make variation calling module or basic module for genome browser. I am already working on it.For the solexa data processing, I am trying to save them into axtNet format in pairwiseMode. Of course I can save as annotation database, but I thought it would be much useful. Correct me if there is another way to do, maybe combination of annotation database and seqdb?

Problems We Must Solve

Namshin Kim: Here are the problems. Assume that I decided to save them as pygr-aware axtNet format. Usually, sequence ID is long, average 20 characters. It means I need 40GB memory (if python saves them by unicode) to build prefixUnion. We have hundreds of 8-core machines with 16GB memory, but not 40GB memory. My conclusion is to split them into smaller pieces, maybe 100M reads or smaller.

Tools we should consider

Mapping Methods

Shawn Cokus in Matteo Pellegrini's lab has developed a probabilistic mapping algorithm that is fast, scalable, and accurate. We've used this for an alternative splicing Solexa analysis. I've mentioned to Shawn that it would be interesting to incorporate this into Pygr with an NLMSA-like interface.

Database Classes

Chris Lee: I think we should consider having a sequence database class optimized for huge numbers of fixed length reads (like Solexa).


Comment by the.good.doctor.is.in, Jul 20, 2008

Namshin says:

""" Using current version of Pygr, I can build prefixUnionDict for 60M reads with 16GB memory machine. That is the limitation due to string ID. """


Sign in to add a comment
Hosted by Google Code