whatswrong

What's Wrong With My NLP?

What's Wrong With My NLP?: A visualizer for Natural Language Processing problems.

Features

(Jointly) visualize
- syntactic dependency graphs
- semantic dependency graphs (a la CoNLL 2008)
- Chunks (such as syntactic chunks, NER chunks, SRL chunks etc.)
- Bilingual alignments
- BioNLP events, proteins, locations
- Generic format to load and visualize your own data.
Compare gold standard trees to your generated trees (e.g. highlight false positive and negative dependency edges)
Filter trees and visualize only what's necessary, for example
- only dependency edges with certain labels
- only the edges between certain tokens
Search corpora for sentences with certain attributes using powerful search expressions, for example
- search for all sentences that contain the word "vantage" and the pos tag sequence DT NN
- search for all sentences that contain false positive edges and the word "vantage"
Reads
- CoNLL 2000, 2002, 2003, 2004, 2006 and 2008 format
- Lisp S-Expressions
- Malt-Tab format
- markov thebeast format
- BioNLP 2009 Shared Task format (see example graph below and check how to load the annotation files).
Export to EPS
Provides API that allows you to incorporate NLP visualization in your application

Check this screenshot to get a better idea.

News

Version 0.2.4 released

Some bug fixes, and deployment to maven central. You can now use the whatswrong library in your maven project with

<dependency> <groupId>org.riedelcastro</groupId> <artifactId>whatswrong</artifactId> <version>0.2.4</version> </dependency>

Version 0.2.3 released

July 1st 2010: Added support for additional description for edges and spans to be printed when clicking on these. Minor bugfixes.

Version 0.2.2 released

Some minor bugfixes: fixed loading of UTF-8 files, and square brackets in tab files.

Version 0.2.1 released

Some minor bugfixes. See changes.

Version 0.2.0 released

This version supports the data format of the BioNLP 2009 Shared Task and jointly displays proteins, cites, events, their arguments and event clues.

Moreover, version 0.2.0 is now built with maven. A more verbose list of changes can be found here.

How to run

Download the jar file and execute java -jar whatswrong-standalone-x.y.z.jar

Screenshots

CoNLL 2008

This is a fraction of a semantic dependency graph that compares a gold labelling to a system labelling. The red edges are false positives, the blue ones false negatives and the black ones are matches:

CoNLL 2003

This shows the comparison of two shallow parses and two NER labellings (again false positives are red, false negatives are blue and matches are black):

Alignment

This shows the comparison of two alignments between a German and an English sentence. Again false positive alignments are red, false negatives are blue and matches are black. Note alignment visualization is not available before version 0.2.0a and that we use this file format.

BioNLP 2009 Event Extraction

This shows the comparison between two event annotations for BioNLP 2009 Shared task data. As usual, blue edges and spans are false negatives, red ones are false positives.

Note that the visualizer visualizes a complete abstract (as a opposed to a sentence-based visualization) from left to right. Also note that whatswrong is essentially token-based, so for mentions which do not fully cover tokens (such as "binding" in "DNA-binding" still the complete token is marked as mention.

Documentation

Most of the functionality can hopefully be understood by just playing around with the example graph. For source documentation check the JavaDoc.

Questions?

Just join the Discussion group and post your question there.

Project Information

License: GNU GPL v3
48 stars
svn-based source control

Labels:
Java NLP DependencyParsing SemanticRoleLabelling Visualizer CoNLL Alignment Chunking BioNLP

Code

Archive