dualist


Interactive machine learning for text analysis

DUALIST: Utility for Active Learning with Instances and Semantic Terms


NOTE: This project has moved to: https://github.com/burrsettles/dualist


Hooray for recursive acronyms!

DUALIST is an interactive machine learning system for quickly building classifiers for text processing tasks. It does so by asking "questions" of a human "teacher" in the form of both data instances (e.g., text documents) and features (e.g., words or phrases). It uses active learning and semi-supervised learning to build text-based classifiers at interactive speed.

Research related to DUALIST is described in these publications:


Watch a demonstration video of DUALIST in action:


The goals of this project are threefold:

  1. A practical tool to facilitate annotation/learning in text analysis projects.
  2. A framework to facilitate research in interactive and multi-modal active learning. This includes enabling actual user experiments with the GUI (as opposed to simulated experiments, which are pervasive in the literature but sometimes inconclusive for use in practice) and exploring HCI issues, as well as supporting new dual supervision algorithms which are fast enough to be interactive, accurate enough to be useful, and might make more appropriate modeling assumptions than multinomial naive Bayes (the current underlying model).
  3. A starting point for more sophisticated interactive learning scenarios that combine multiple "beyond supervised learning" strategies. See the proceedings of the recent ICML 2011 workshop on this topic.

This work is supported in part by DARPA (under contract numbers FA8750-08-1-0009 and AF8750-09-C-0179), the National Science Foundation (IIS-0968487), and Google. Any opinions, findings and conclusions or recommendations expressed in this material are the authors' and do not necessarily reflect those of the sponsors.

Project Information

The project was created on Mar 18, 2011.

Labels:
Machinelearning NLP