
whatswrong
What's Wrong With My NLP?: A visualizer for Natural Language Processing problems.
Features
- (Jointly) visualize
- syntactic dependency graphs
- semantic dependency graphs (a la CoNLL 2008)
- Chunks (such as syntactic chunks, NER chunks, SRL chunks etc.)
- Bilingual alignments
- BioNLP events, proteins, locations
- Generic format to load and visualize your own data.
- Compare gold standard trees to your generated trees (e.g. highlight false positive and negative dependency edges)
- Filter trees and visualize only what's necessary, for example
- only dependency edges with certain labels
- only the edges between certain tokens
- Search corpora for sentences with certain attributes using powerful search expressions, for example
- search for all sentences that contain the word "vantage" and the pos tag sequence DT NN
- search for all sentences that contain false positive edges and the word "vantage"
- Reads
- CoNLL 2000, 2002, 2003, 2004, 2006 and 2008 format
- Lisp S-Expressions
- Malt-Tab format
- markov thebeast format
- BioNLP 2009 Shared Task format (see example graph below and check how to load the annotation files).
- Export to EPS
- Provides API that allows you to incorporate NLP visualization in your application
Check this screenshot to get a better idea.
News
Version 0.2.4 released
Some bug fixes, and deployment to maven central. You can now use the whatswrong library in your maven project with
<dependency>
<groupId>org.riedelcastro</groupId>
<artifactId>whatswrong</artifactId>
<version>0.2.4</version>
</dependency>
Version 0.2.3 released
July 1st 2010: Added support for additional description for edges and spans to be printed when clicking on these. Minor bugfixes.
Version 0.2.2 released
Some minor bugfixes: fixed loading of UTF-8 files, and square brackets in tab files.
Version 0.2.1 released
Some minor bugfixes. See changes.
Version 0.2.0 released
This version supports the data format of the BioNLP 2009 Shared Task and jointly displays proteins, cites, events, their arguments and event clues.
Moreover, version 0.2.0 is now built with maven. A more verbose list of changes can be found here.
How to run
Download the jar file and execute
java -jar whatswrong-standalone-x.y.z.jar
Screenshots
CoNLL 2008
This is a fraction of a semantic dependency graph that compares a gold labelling to a system labelling. The red edges are false positives, the blue ones false negatives and the black ones are matches:
CoNLL 2003
This shows the comparison of two shallow parses and two NER labellings (again false positives are red, false negatives are blue and matches are black):
Alignment
This shows the comparison of two alignments between a German and an English sentence. Again false positive alignments are red, false negatives are blue and matches are black. Note alignment visualization is not available before version 0.2.0a and that we use this file format.
BioNLP 2009 Event Extraction
This shows the comparison between two event annotations for BioNLP 2009 Shared task data. As usual, blue edges and spans are false negatives, red ones are false positives.
Note that the visualizer visualizes a complete abstract (as a opposed to a sentence-based visualization) from left to right. Also note that whatswrong is essentially token-based, so for mentions which do not fully cover tokens (such as "binding" in "DNA-binding" still the complete token is marked as mention.
Documentation
Most of the functionality can hopefully be understood by just playing around with the example graph. For source documentation check the JavaDoc.
Questions?
Just join the Discussion group and post your question there.
Project Information
- License: GNU GPL v3
- 48 stars
- svn-based source control
Labels:
Java
NLP
DependencyParsing
SemanticRoleLabelling
Visualizer
CoNLL
Alignment
Chunking
BioNLP