|
Project Information
Members
Featured
Downloads
Links
|
The Semantic Vectors PackageSemanticVectors creates semantic WordSpace models from free natural language text. Such models are designed to represent words and documents in terms of underlying concepts. They can be used for many semantic (concept-aware) matching tasks such as automatic thesaurus generation, knowledge representation, and concept matching. These are described more thoroughly in the UseCases page. The models are created by applying concept mapping algorithms to term-document matrices created using Apache Lucene. The concept mapping algorithms supported by the package include Random Projection, Latent Semantic Analysis (LSA) and Reflective Random Indexing. Random Projection is the most scalable technique in practice, because it does not rely on the use of computationally intensive matrix decomposition algorithms. The application of Random Projection for Natural Language Processing (NLP) is descended from Pentti Kanerva's work on Sparse Distributed Memory, which in semantic analysis and text mining, this method has also been called Random Indexing. Singular Value Decomposition is also popular because it is better known, and has in some cases given better results on smaller datasets. The package was created as part of a project by the University of Pittsburgh Office of Technology Management, and is now developed and maintained by contributors from the University of Texas, Queensland University of Technology, the Austrian Research Institute for Artificial Intelligence, Google Inc., and other institutions and individuals. Documentation
The package requires Apache Ant and Apache Lucene to have been installed, and the Lucene classes must be available in your CLASSPATH. User GroupIssues and bugs can be posted using the Issues tab above. More general questions and discussions may be posted at the group webpage, http://groups.google.com/group/semanticvectors. Originally written by Dominic Widdows, in collaboration with Kathleen Ferraro and the University of Pittsburgh. The project is now maintained and extended by a small group of developers, as listed in the SemanticVectors AUTHORS file. Projects Using Semantic VectorsWe're starting a list of ProjectsUsingSemanticVectors. We're aware of a few more that we'll try to add in due course: please visit this page and leave comments if you know of any. |