augustus


PMML model producer and consumer. Scoring engine.

Augustus

Augustus is an open source system for building and scoring statistical models designed to work with data sets that are too large to fit into memory.

Augustus is now available under the Apache Software License. Older versions will remain on the GNU GPL v2 open source license.

Quick Links

To get started, we offer an Installation Guide and a Modeling Primer which covers Augustus examples.

Installation GuideModeling PrimerAugustus API (Epydoc) Online Documentation PDF version Online documentation PDF Version Online Documentation

0.6 Beta

Augustus 0.6 is being released as a beta. This is a substantial change from previous versions and will be updated frequently as necessary features are added. An online, interactive tutorial is available at http://augustusdocs.appspot.com/docs/welcome/index.html It is recommended that you start here if you are new to Augustus or to modeling.

The trunk, as of version 753, has been updated to be the 0.6 beta release. If you want to use a tagged beta release, the current one is at branches/augustus-0.6.beta2

0.5.3.0 Release

Augustus 0.5.3.0 is now available. The source can be checked out at tags/augustus-0.5.3.0.

Augustus 0.5.0.0 represents a substantial change for Augustus. Please refer to Augustus 0.5 Overview for more information about the release.

There are two examples, gaslog and email, under augustus-examples which use the updated configuration and demonstrate some of the new features.

PMML

Predictive Model Markup Language (PMML) is an XML mark up language to describe statistical and data mining models. PMML describes the inputs to data mining models, the transformations used to prepare data for data mining, and the parameters which define the models themselves. It is used for a wide variety of applications, including applications in finance, e-business, direct marketing, manufacturing, and defense. PMML is often used so that systems which create statistical and data mining models ("PMML Producers") can easily inter-operate with systems which deploy PMML models for scoring or other operational purposes ("PMML Consumers").

Open Data

Open Data Group specializes in building predictive models over big data and is one of the pioneers using technologies such as Hadoop and NoSQL databases so that companies can build predictive models efficiently over all of their data.

ODG provides management consulting services, outsourced analytical services, analytic staffing, and expert witnesses broadly related to data and analytics. It has experience with customer data, supplier data, financial and trading data, and data from internal business processes.

It has staff in Chicago and clients throughout the U.S. Open Data Group began operations in 2002.

Open Data employs and contributes open source software whenever it can.