My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads
Wiki pages

Overview

A Java implementation of the Structured Prediction Cascades framework as proposed by Weiss & Taskar (2010). This project contains an updated version of the code used to generate the handwriting recognition/OCR results in the paper, as well as a properly formatted version of the OCR dataset. This project is published under the MIT license, which allows free usage and redistribution of the code for any purpose so long as the license and copyright statement remains in the code.

An early version of this code was used to generate the OCR results in the original paper. The results have improved further, yielding <7% error. This is less than half the error of other state-of-the-art methods, when trained using the 5500/600 train/test example split on this dataset.

Method Char Error Word Error
Structured Prediction Cascades 6.28 % 12.03%
Ratliff (2010) Sub-gradient 13% N/A

The OCR dataset preformatted for use with this code is available for download from this website.

References:

  • David Weiss & Ben Taskar. Structured Prediction Cascades. AISTATS 2010 [link]
  • Nathan D. Ratliffe. Learning to Search: Structured Prediction Techniques for Imitation Learning. Doctoral Thesis [link]

Contributors

The authors of this package are David Weiss and Kuzman Ganchev, with additional help from Joao Graca.

Installation (binary)

  1. This code uses GNU Trove for occasional data structures and the ANTLR parser generator to parse configuration files. Download the two necessary dependencies:
  2. Download the structured-cascades.tar.gz file and extract in the same directory as the two libraries you downloaded.
  3. You can now invoke the program by adding the three jars to your Java class path, or by invoking java as follows:
  4. java -cp antlr-runtime.jar:trove-2.1.0.jar:structured-cascades.jar 
          cascades.programs.TrainTagger <args>
  5. See the QuickStart guide to run the demo on the OCR dataset.

To install from source code, check out the code from the repository and follow the README instructions. The repository can be checked out directly from Eclipse if you install the Mercurial Eclipse plugin from the Eclipse Marketplace.

Usage

See the wiki documentation for usage instructions:

  • QuickStart - A quick demo to run the OCR dataset showcasing the state-of-the-art accuracy of the SC
  • UsingConfigurationFiles - How to set up .fig files that the program uses to configure the cascade
  • AdvancedUsage - Testing on alternative datasets, inspecting the cascade output, etc.
Powered by Google Project Hosting