My favorites | Sign in
Project Logo
             
Search
for
Updated Dec 25, 2008 by battlehorse
Labels: Phase-Design, Phase-Deploy
Architecture  
Describes the architecture for the bayes-swarm engine

NOTE : this document describes the actual state of the engine, codenamed Pulsar.

Phases

Bayes-Swarm divides the process of analyzing web sources into different phases. Each phase is handled by a different component :

Components

The bayes-swarm engine is structured in terms of modular components :

Technical Details

We are currently using the following technologies to keep bayes-swarm up and running.

Database

The database of choice is MySql .

The production version is 4.1.22, but locally we use 5.x, since our design does not leverage any feature specific for version 5.

You may want to use some database management tool to work with it. We suggest DBVisualizer , or MySql GUI Tools .

Backend engines and web site

The engine which performs data extraction, initial analysis and database storage is written in ruby. In addition to ruby, you may require some additional libraries, therefore you should also install the rubygems packaging system.

The website is currently powered by Ruby on Rails .

The project relies on the following gems, in addition to the ones provided by the standard library:

  • Hpricot
  • Mechanize
  • Ferret

A good introduction to ruby is Rolling with Ruby on Rails (revisited) http://www.onlamp.com/pub/a/onlamp/2005/01/20/rails.html

Analysis Engine

In addition to code written in ruby, the language of choice for prototypes and additional analysis is R, wich provides, among other nice things like time series and cluster analysis, RMySQL an interface to MySql databases.

MeanMachine

MeanMachine is developed in python and relies on additional libraries, such as the Xapian indexing library and PyGTK GUI library.


Sign in to add a comment
Hosted by Google Code