dualist

Interactive machine learning for text analysis

DUALIST: Utility for Active Learning with Instances and Semantic Terms

NOTE: This project has moved to: https://github.com/burrsettles/dualist

Hooray for recursive acronyms!

DUALIST is an interactive machine learning system for quickly building classifiers for text processing tasks. It does so by asking "questions" of a human "teacher" in the form of both data instances (e.g., text documents) and features (e.g., words or phrases). It uses active learning and semi-supervised learning to build text-based classifiers at interactive speed.

Research related to DUALIST is described in these publications:

B. Settles. Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1467-1478. ACL, 2011. (addendum)
B. Settles and X. Zhu. Behavioral Factors in Interactive Training of Text Classifiers. In Proceedings of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT), pages 563-567. ACL, 2012.

Watch a demonstration video of DUALIST in action:

The goals of this project are threefold:

A practical tool to facilitate annotation/learning in text analysis projects.
A framework to facilitate research in interactive and multi-modal active learning. This includes enabling actual user experiments with the GUI (as opposed to simulated experiments, which are pervasive in the literature but sometimes inconclusive for use in practice) and exploring HCI issues, as well as supporting new dual supervision algorithms which are fast enough to be interactive, accurate enough to be useful, and might make more appropriate modeling assumptions than multinomial naive Bayes (the current underlying model).
A starting point for more sophisticated interactive learning scenarios that combine multiple "beyond supervised learning" strategies. See the proceedings of the recent ICML 2011 workshop on this topic.

This work is supported in part by DARPA (under contract numbers FA8750-08-1-0009 and AF8750-09-C-0179), the National Science Foundation (IIS-0968487), and Google. Any opinions, findings and conclusions or recommendations expressed in this material are the authors' and do not necessarily reflect those of the sponsors.

Project Information

The project was created on Mar 18, 2011.

License: Apache License 2.0
40 stars
svn-based source control

Labels:
Machinelearning NLP

Code

Archive

dualist

DUALIST: Utility for Active Learning with Instances and Semantic Terms

NOTE: This project has moved to: https://github.com/burrsettles/dualist

Project Information