My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
InstallationInstructions  
Installation guide for semantic vectors package.
Phase-Deploy
Updated Jan 16, 2012 by dwidd...@gmail.com

Summary

This page contains brief instructions for installing and running the Semantic Vectors package.

These sketch instructions presume that you're reasonably familiar with Java, Ant, CLASSPATHs, etc. If not, you might struggle a bit. Better documentation will hopefully follow sometime soon.

Prerequisites (all Installations)

  • Make sure you have a Java Development Kit (JDK) installed on your system, and that the java and javac programs can be seen from you system PATH variable.
    • If you type java and javac at your command line, you should see options and instructions, rather than a "Command not found" error.
    • On some systems you may need to set your JAVA_HOME environment variable.
  • Make sure that you have the Apache Lucene core and demo jar files.
    • Unfortunately SemanticVectors doesn't work with all Lucene versions. See the page on LuceneCompatibility.
    • To test this, run java org.apache.lucene.demo.IndexFiles on the directory containing the corpus from which you'd like to build a SemanticVector model. You'll need to do this anyway, so might as well do it first to check your Lucene installation.

Binary Installation from Jar Distribution

This is the simplest approach for just getting SemanticVectors working.

  • Download the most recent semanticvectors-*.*.jar distribution from this site.
  • Add the Lucene core and demo jar files to your CLASSPATH.
  • Add the file (including full path) to your CLASSPATH.

You won't be able to alter the programs beyond the configuration that's possible with command line flags. If you want to do that, you need to build from source.

Compiling from Source - Package Installation

Prerequisites

As well as the Lucene and Java prerequisites, you will also need to install Apache Ant.

  • You may need to set your ANT_HOME environment variable as well.

Installation

  • Download the most recent semanticvectors-*.*.tar.gz or semanticvectors-*.*.zip archive and expand the archive (using tar -zxvf or unzip).
  • Change directory to the new semanticvectors-*.* that has just been created. Check that you can see the build.xml' file and src/` directory.
  • The locations of the lucene core and demo jar files can be specified by one of the following methods (only setting CLASSPATH works for versions prior to 2010-09-21):
    • add them to your CLASSPATH environment variable
    • set the environment variable SEMVEC_LIBDIR to the directory that contains them
    • create a file build.properties that contains a line libdir=directorypath where directorypath is the path to the directory that contains the jars
    • create a directory with the name lib in the current directory and add the jar files to that directory
  • Compile just by typing ant at the command line. This should create build and doc directories. The compiles classes are in the build directory.
  • Make sure that either the resulting build/classes directory or build/*.jar file is in your CLASSPATH.

Compiling from Source - Most Recent Development Installation

If you might be making changes to the code, or want to try out new features that are checked in but not yet in the numbered releases, please consider checking out the most recent version from the svn repository. If you make changes that turn out to be useful, please write to us and tell us about them, and we'll probably urge you to submit them to the repository.

To checkout from source, see the checkout instructions. You will need to use SVN to checkout the code (for Eclipse users, try Subclipse).

To build the project, use ant as described above, and make sure that either the resulting build/classes directory or build/*.jar file is in your CLASSPATH.

If this fails or is too daunting (which it may be, there's a lot going on and getting everything to work together eventually becomes cumbersome), please don't hesitate to contact the project developers and we'll try to help out or make a new numbered release with the development features you need.

To Build and Search a Model

  • Create a Lucene index using the Lucene demo, by running java org.apache.lucene.demo.IndexFiles on the directory containing your corpus.
  • Create term and document vectors by running java pitt.search.semanticvectors.BuildIndex.
  • To search the resulting model, run java pitt.search.semanticvectors.Search QUERYTERMS.
  • If the upper case term NOT appears in your QUERYTERMS, the query parser will add the terms preceding the NOT term and negate all the terms after it. See VectorNegation.
  • To compare two concepts, run java pitt.search.semanticvectors.CompareTerms "QUERYTERMS1" "QUERYTERMS2".
  • For more information on searching, see SearchOptions.
  • If you want to search for relevant documents, see DocumentSearch.

Training Cycles

Models can be built in several phases by passing the document vectors back to rebuild new term vectors. See TrainingCycles.

Bilingual Models

For instructions on building a bilingual model from a parallel corpus, see BilingualModels.

Positional Indexes

For instructions on building an index based on term positions, see PositionalIndexes.

Permutation Search

Indexes can now be built that encode directional relationships between words. See PermutationSearch.

Clustering and Visualization

Have some fun building clusters and pictures! Instructions are on the ClusteringAndVisualization page.

Vector Store Formats

The SemanticVectors package currently (as of version 1.6) supports two different vector store formats, a plain text format and an optimized format created by the Lucene I/O packages. For more information including format translation utilities, see VectorStoreFormats.

Developer API Docs

See http://semanticvectors.googlecode.com/svn/javadoc/latest-stable/index.html Some useful information may be found in the ReleaseLog.


Sign in to add a comment
Powered by Google Project Hosting