
est2assembly
The est2assembly platform is the only platform for standardising transcriptome projects: go from raw trace files to an annotated GBrowse interface driven by the Seqfeature database. It accepts both Sanger and 454 sequencing technology for a denovo assembly, annotation and data mining of EST data.
A Demonstration of it's capabilities can be seen at Google Code and InsectaCentral.org
News
8th Jan 2010
- Our IT departments decided to change our IP addresses on the 8th of January without considering implications on the DNS :-( As a result service to website is disrupted. Sorry about that.
11th Jan 2010
- Service to http://insectacentral.org is restored, all others are still waiting for an IT elf to press the button...
- Publication is now accessible at http://www.biomedcentral.com/1471-2105/10/447
17th Jan 2010
- Thanks to Till for pointing out that the rfc.ex.ac.uk webpage is still down but is needed to distribute the archives. I'll try to upload them to the Google Code repository. I will have to split them to 100 Mb chunks (google restrictions) so may take a while. Really sorry about the IPs but we hope that they fix it soon... Seems though, that Google is more realiable than a University's IT department...
29th Jan 2010
- There is a tiny bug in the install.pl script prior release 1.02. It caused errors in the last step when copying the links:
Instances of "find -L ./$installed " should become => "find -L $installed " No need to download updated tar archive for this small change. You could manually edit it or updated from CVS. Sorry and thanks.
10 Mar 2010
Version 1.03 released to coincide with the MIRA3.01 hotfix. Please update est2assembly as it features a number of improvements.
- Bug fixes on install.pl
- trim_assembly.pl now uses cdhit-est (not cd-hit-est; needs to be installed) because it is much faster than a selfblast (old cd-hit-est was not fast enough). TODO: document it
- Added io-lib 1.12 distribution
- Added support for MIRA3. Please see parameterize_assembly/bin/README.Recommend you use it instead of MIRA2
- Added -cdna option in Newbler (requires Newbler 2+). This may result Newbler 1 complaining and crashing.
- Fixed bug in prot4EST (introduced in a previous est2assembly version) which prevented ESTScan from running
oops, file 1.03 is now fixed with the following: * Fixed small bug I introduced to ic_create_naming.pl (added use Bio::SeqIO on the top)
23 April 2010
- Bug fixes and improvements on automation and the Chado side of things
18 Jun 2010
- Bug fixes
16 Aug 2010
From Bastien Chevreux:
MIRA V3.2.0 has been released at SourceForge.
Changes to the previous 3.0.5 version can be described briefly in three major points:
1 Support for data from Pacific Biosciences.
Though the 3.0.x (and even earlier 2.9.x) versions could handle PacBio data when configured accordingly, the 3.2.0 line of MIRA now officially supports PacBio.
Beside the usual support for non-paired and "paired-end" data, MIRA also has a new automatic editor for PacBio reads which should be useful when dealing with "elastic dark inserts" (longer stretches of unread bases whose length is only approximately known) in PacBio strobed sequencing mode.
With this, MIRA should be able to deal with strobes of unread bases up to ~400 bases without having to split strobed sequences in multiple read- pairs. Simulation with bacterial data show that PacBio strobe reads in 200/200 mode (sequence 200 bases, skip 200 bases, repeat) and having 3000 sequenced bases can reconstruct a bacterial genome quite well (1 contig, correct genome organisation), leaving only "clean up" work of getting some bases right via, e.g., hybrid approach by complementing with e.g. Solexa (Illumina) data.
Though I admit that PacBio support is a long shot (I don't have real PacBio data at the moment), I expect MIRA to be "good enough" for first test with real data (for those people out there actually having access to some). On the other hand, I haven't seen any other assembler yet being able to support strobed reads without splitting them.
2 Fully revamped manual
In an exercise of self-defence (too many mails in my inbox), I've updated the manuals to DocBook format and considerably expanded them. The result: nicer manuals in HTML and PDF format, with extensive walkthroughs There are also screenshots in colour. And variegated.
3 The "usual" improvements and bug-fixes
If PacBio and the revamped manuals had not been on the schedule, the "usual" improvements would have a had been described more prominently in this announcement. Things being what they are, I'll just mention * hybrid Sanger/Solexa or 454/Solexa of bacteria now finish within hours instead of days. * longer contigs * less memory utilisation (thanks to Google TCMalloc library) * better support for SSDs * warnings when using NFS
Have fun with MIRA,
Bastien
22 Sep 2010
Updated packaging method for est2assembly_dataA and est2assembly_dataB
30 April 201
Finally committed various enhancements. Illumina support nearly ready (via Broad's Inchworm & Trinity RNA-Seq)
Project Information
- License: GNU GPL v3
- 4 stars
- svn-based source control
Labels:
EST
transcriptomes
bioinformatics
NextGenerationSequencing
454
ExpressedSequenceTags