My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads

What is MoSDi?

MoSDi contains a lot of sequence analysis algorithms, including methods for

  1. motif statistics, e.g. compute the exact occurrence count distribution of a motif,
  2. exact motif discovery: extraction of motifs with provably optimal p-value,
  3. analysis of pattern matching algorithms: compute (for given algorithm and pattern) the exact distribution of the number of character accesses caused by searching a random text.

Besides that, MoSDi contains a lot of useful tiny features; for example, you can

  • count all q-grams in a text,
  • generate a random text distributed according to i.i.d. or Markovian text models,
  • enumerate all IUPAC motifs (subject user-specified constraints),
  • cut out all occurrences of a IUPAC motif from given sequence,
  • output a position frequency matrix of all occurrences of a IUPAC motif,
  • ...

Documentation

Right now, there are three sources of documentation on MoSDi you can use:

  1. the documentation wiki page, which contains examples for common use cases,
  2. the usage information provided by all tools when called without parameters,
  3. of course, the source code itself.

References

In many scientific articles in computer science, experimental results are presented but software is not published. This makes it very difficult (sometimes impossible) to reproduce results. MoSDi was written while doing research and contains (besides other stuff) implementations of algorithms we published about. I'm trying my best to keep MoSDi in a state such that other scientists can reproduce my results. On the other hand, our articles contain theoretical background and might be worth a look if you want to understand what's going on in this software.

Motif Statistics

Tobias Marschall and Sven Rahmann. Probabilistic arithmetic automata and their application to pattern matching statistics. In Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 95–106, 2008. DOI: 10.1007/978-3-540-69068-9_11.

Motif Discovery

Tobias Marschall and Sven Rahmann. Efficient exact motif discovery. Bioinformatics (Proceedings of ISMB), 25(12):i356–364, 2009. DOI: 10.1093/bioinformatics/btp188.

Tobias Marschall and Sven Rahmann. Speeding up Exact Motif Discovery by Bounding the Expected Clump Size. In Proceedings of the 10th Workshop on Algorithms in Bioinformatics (WABI), pages 337-349, 2010. DOI: 10.1007/978-3-642-15294-8_28

Pattern Matching Analysis

Tobias Marschall and Sven Rahmann. Exact Analysis of Pattern Matching Algorithms with Probabilistic Arithmetic Automata. arxiv: 1009.6114.

Contact

Visit my website and feel free to contact me by email: T.Marschall at cwi dot nl.

Powered by Google Project Hosting