est2assembly

Assembly and annotation of transcriptomes for any species

The est2assembly platform is the only platform for standardising transcriptome projects: go from raw trace files to an annotated GBrowse interface driven by the Seqfeature database. It accepts both Sanger and 454 sequencing technology for a denovo assembly, annotation and data mining of EST data.

A Demonstration of it's capabilities can be seen at Google Code and InsectaCentral.org

News

8th Jan 2010

Our IT departments decided to change our IP addresses on the 8th of January without considering implications on the DNS :-( As a result service to website is disrupted. Sorry about that.

11th Jan 2010

Service to http://insectacentral.org is restored, all others are still waiting for an IT elf to press the button...
Publication is now accessible at http://www.biomedcentral.com/1471-2105/10/447

17th Jan 2010

Thanks to Till for pointing out that the rfc.ex.ac.uk webpage is still down but is needed to distribute the archives. I'll try to upload them to the Google Code repository. I will have to split them to 100 Mb chunks (google restrictions) so may take a while. Really sorry about the IPs but we hope that they fix it soon... Seems though, that Google is more realiable than a University's IT department...

29th Jan 2010

There is a tiny bug in the install.pl script prior release 1.02. It caused errors in the last step when copying the links:

Instances of "find -L ./$installed " should become => "find -L $installed " No need to download updated tar archive for this small change. You could manually edit it or updated from CVS. Sorry and thanks.

10 Mar 2010

Version 1.03 released to coincide with the MIRA3.01 hotfix. Please update est2assembly as it features a number of improvements.

Bug fixes on install.pl
trim_assembly.pl now uses cdhit-est (not cd-hit-est; needs to be installed) because it is much faster than a selfblast (old cd-hit-est was not fast enough). TODO: document it
Added io-lib 1.12 distribution
Added support for MIRA3. Please see parameterize_assembly/bin/README.Recommend you use it instead of MIRA2
Added -cdna option in Newbler (requires Newbler 2+). This may result Newbler 1 complaining and crashing.
Fixed bug in prot4EST (introduced in a previous est2assembly version) which prevented ESTScan from running

10 Mar 2010

oops, file 1.03 is now fixed with the following: * Fixed small bug I introduced to ic_create_naming.pl (added use Bio::SeqIO on the top)

23 April 2010

Bug fixes and improvements on automation and the Chado side of things

18 Jun 2010

Bug fixes

16 Aug 2010

From Bastien Chevreux:

MIRA V3.2.0 has been released at SourceForge.

http://sourceforge.net/projects/mira-assembler/

Changes to the previous 3.0.5 version can be described briefly in three major points:

1 Support for data from Pacific Biosciences.

Though the 3.0.x (and even earlier 2.9.x) versions could handle PacBio data when configured accordingly, the 3.2.0 line of MIRA now officially supports PacBio.

Beside the usual support for non-paired and "paired-end" data, MIRA also has a new automatic editor for PacBio reads which should be useful when dealing with "elastic dark inserts" (longer stretches of unread bases whose length is only approximately known) in PacBio strobed sequencing mode.

With this, MIRA should be able to deal with strobes of unread bases up to ~400 bases without having to split strobed sequences in multiple read- pairs. Simulation with bacterial data show that PacBio strobe reads in 200/200 mode (sequence 200 bases, skip 200 bases, repeat) and having 3000 sequenced bases can reconstruct a bacterial genome quite well (1 contig, correct genome organisation), leaving only "clean up" work of getting some bases right via, e.g., hybrid approach by complementing with e.g. Solexa (Illumina) data.

Though I admit that PacBio support is a long shot (I don't have real PacBio data at the moment), I expect MIRA to be "good enough" for first test with real data (for those people out there actually having access to some). On the other hand, I haven't seen any other assembler yet being able to support strobed reads without splitting them.

2 Fully revamped manual

In an exercise of self-defence (too many mails in my inbox), I've updated the manuals to DocBook format and considerably expanded them. The result: nicer manuals in HTML and PDF format, with extensive walkthroughs There are also screenshots in colour. And variegated.

3 The "usual" improvements and bug-fixes

If PacBio and the revamped manuals had not been on the schedule, the "usual" improvements would have a had been described more prominently in this announcement. Things being what they are, I'll just mention * hybrid Sanger/Solexa or 454/Solexa of bacteria now finish within hours instead of days. * longer contigs * less memory utilisation (thanks to Google TCMalloc library) * better support for SSDs * warnings when using NFS

Have fun with MIRA,

Bastien

22 Sep 2010

Updated packaging method for est2assembly_dataA and est2assembly_dataB

30 April 201

Finally committed various enhancements. Illumina support nearly ready (via Broad's Inchworm & Trinity RNA-Seq)

Project Information

License: GNU GPL v3
4 stars
svn-based source control

Labels:
EST transcriptomes bioinformatics NextGenerationSequencing 454 ExpressedSequenceTags