|
Project Information
Featured
|
WS4J WS4J is a reimplementation of WordNet::Similarity in Java. IntroductionWS4J provides APIs for several Semantic Relatedness/Similarity algorithms. In theory, any WordNet instance can be used to calculate relatedness score as long as it implements an interface ILexicalDatabase. The codebase has been mostly ported from WordNet-Similarity-2.05. We also use data files from WordNet-Similarity-2.05 and WordNet-InfoContent-3.0, as seen in src/main/resources. WS4J implementation is thread-safe. RequirementFor most of people: If you would like to build the source code by yourself, you need the following additional software: - (optional) Eclipse + Subclipse
- Maven 2 or m2eclipse plugin for eclipse
- JAWJAW in your local maven repository.
- After obtaining the JAWJAW source code and NICT WordNet, run launches/JAWJAW_install.launch inside JAWJAW to install it to a local maven repository.
See this link for checking out ws4j source code. Instructions to run sample codesFirst of all, run all JUnit tests to verify that you get the same result as the Perl version of WordNet::Similarity. This can be done by launching a file /launches/WS4J_Run_All_JUnitTests.launch Then start playing with the facade API edu.cmu.lti.ws4j.WS4J and a simple demo class edu.cmu.lti.ws4j.demo.SimilarityCalculationDemo Semantic Relatedness Metrics AvailableDescriptions are either from the author's paper or from WordNet-Similarity CPAN documentation linked from each ID. | ID | Paper | Description | | HSO | Hirst and St-Onge (1998) | Two lexicalized concepts are semantically close if their WordNet synsets are connected by a path that is not too long and that "does not change direction too often". | | LCH | Leacock and Chodorow (1998) | Rely on the length of the shortest path between two synsets for their measure of similarity. They limit their attention to IS-A links and scale the path length by the overall depth D of the taxonomy | | LESK | Banerjee and Pedersen (2002) | Lesk (1985) proposed that the relatedness of two words is proportional to to the extent of overlaps of their dictionary definitions. Banerjee and Pedersen (2002) extended this notion to use WordNet as the dictionary for the word definitions. | | WUP | Wu and Palmer (1994) | The Wu & Palmer measure calculates relatedness by considering the depths of the two synsets in the WordNet taxonomies, along with the depth of the LCS | | RES | Resnik (1995) | Resnik defined the similarity between two synsets to be the information content of their lowest super-ordinate (most specific common subsumer) | | JCN | Jiang and Conrath (1997) | Also uses the notion of information content, but in the form of the conditional probability of encountering an instance of a child-synset given an instance of a parent synset: 1 / jcn_distance, where jcn_distance is equal to IC(synset1) + IC(synset2) - 2 * IC(lcs). | | LIN | Lin (1998) | Math equation is modified a little bit from Jiang and Conrath: 2 * IC(lcs) / (IC(synset1) + IC(synset2)). Where IC(x) is the information content of x. One can observe, then, that the relatedness value will be greater-than or equal-to zero and less-than or equal-to one. |
TODOLinks
|