My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
RelatedResearch  
A list of research related to semantic vector spaces generally.
Updated Sep 8, 2010 by dominicw...@gmail.com

This is a large topic on which much has been written. There are many many more papers we should add here, please feel free to write in with suggestions. Or even better, join the project and start adding them yourself.

Papers that talk about the SemanticVectors Package itself

Semantic Vectors: A Scalable Open Source Package and Online Technology Management Application. Dominic Widdows, Kathleen Ferraro, 2008.

Empirical Distributional Semantics: Methods and Biomedical Applications. Trevor Cohen, Dominic Widdows, Journal of Biomedical Informatics (2009) (to appear).

Papers that use the SemanticVectors Package to generate results

Please let us know if you have any articles to add to this list, it is really good for the project to know what results people have obtained using the software.

See also ProjectsUsingSemanticVectors.

New! Newton, G. & A. Callahan & M. Dumontier. 2009. Semantic Journal Mapping for Search Visualization in a Large Scale Article Digital Library. Second Workshop on Very Large Digital Libraries at the European Conference on Digital Libraries (ECDL) 2009. http://gnewton.ca/u/gn/2009/ecdl2009Newton_20090723.pdf.

Semantic Vector Products: Some Initial Investigations. Dominic Widdows, Proceedings of the Second AAAI Symposium on Quantum Interaction, 2008.

Semantic Vector Combinations and the Synoptic Gospels. Dominic Widdows, Trevor Cohen, Third International Symposium on Quantum Interaction, 2009.

Core Papers on Random Projection

(See also RandomProjection.)

New! Pentti Kanerva. Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors. Cognitive Computation, 2009.

Magnus Sahlgren. An introduction to random indexing. Technical report, 2005.

See also the excellent collection at http://www.sics.se/~mange/random_indexing.html.

Ella Bingham and Heikki Mannila. Random projection in dimensionality reduction: applications to image and text data. In Knowledge Discovery and Data Mining, pages 245–250, 2001.

Introductory Material on Semantic Vector Spaces

Geometry and Meaning, Dominic Widdows, CSLI Publications, 2004. The sample chapter may be particularly useful to those unfamiliar with vectors.

Papers about Semantic Spaces more generally

Latent Semantic Analysis (LSA)

Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211-240.

Hyperspace Analogue to Language (HAL)

HAL is the grandfather of another of the main types of semantic space models, term-term coocurrence ("sliding context window") models. See http://hal.ucr.edu/Papers.html

See also Hinrich Schutze, Automatic word sense discrimination. Computational Linguistics. Volume 24-1 (March 1998)

BEAGLE Model

Jones, M. N., & Mewhort,D. J. K. (2007). Representing word meaning and order information in a composite holographic lexicon. Psychological Review, 114, 1-37.

For a detailed introduction to holographic reduces representations, see Tony Plate. Holographic Reduced Representation: Distributed Representation for Cognitive Structures. CSLI Publications.

Comment by ted.dunn...@gmail.com, Jun 24, 2009

These references omit information on a large body of work that predated your other references by nearly a decade. Random indexing is essentially identical to so-called one-step learning that derived from early work at HNC Software and was refined during my tenure as Chief Scientist at Aptex. The only important difference between random indexing and our earlier work relates to the domain of the original vectors. IN our case, we mostly used vectors sampled from multi-dimensional unit normal distribution, in random indexing one uses ternary or binary vectors. We also experimented with binary vectors, but the hardware of the time favored the continuous representation so we focused on that formulation.

Also, the learning step in random indexing is essentially one iteration of a power law extraction of singular vectors. As typically described, this algorithm cannot be used with more than 2-3 iterations because it collapses onto the dominant eigenvectors. When used for a single iteration, sufficient information from the secondary eigenvectors is retained in the form of the original random initial conditions to avoid problems. It should also be noted that even without the context vector training (i.e. using random vectors with no context training), useful performance can be obtained. These considerations make it clear that random indexing and context vector techniques should be considered as an alternative formulation of LSA and other SVD systems.

Here are some references that you may find interesting:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87.7893&rep=rep1&type=pdf

http://www.google.com/patents?hl=en&lr=&vid=USPAT5619709&id=4kkhAAAAEBAJ&oi=fnd&dq=William+Caid

http://www.google.com/patents?hl=en&lr=&vid=USPAT5794178&id=kZogAAAAEBAJ&oi=fnd&dq=William+Caid

http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6VC8-3YMFVB3-1B&_user=7971165&_rdoc=1&_fmt=&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=938841321&_rerunOrigin=scholar.google&_acct=C000050221&_version=1&_urlVersion=0&_userid=7971165&md5=0ac86651fa508bb9b4157b382f281177 http://portal.acm.org/citation.cfm?id=146565.146569

http://www.google.com/patents?hl=en&lr=&vid=USPATAPP10868538&id=L6yfAAAAEBAJ&oi=fnd&dq=William+Caid

http://www.google.com/patents?hl=en&lr=&vid=USPAT6134532&id=J2kGAAAAEBAJ&oi=fnd&dq=William+Caid

http://spiedl.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=PSISDG002606000001000372000001&idtype=cvips&gifs=yes

Comment by glen.new...@gmail.com, Feb 2, 2010

This paper also uses Semantic Vectors: Newton, G. & A. Callahan & M. Dumontier. 2009. Semantic Journal Mapping for Search Visualization in a Large Scale Article Digital Library. Second Workshop on Very Large Digital Libraries at the European Conference on Digital Libraries (ECDL) 2009. http://gnewton.ca/u/gn/2009/ecdl2009Newton_20090723.pdf

Comment by project member sid....@gmail.com, Apr 20, 2010

One more recent paper using Semantic Vectors: Siddhartha Jonnalagadda, Robert Leaman, Trevor Cohen and Graciela Gonzalez. A Distributional Semantics Approach to Simultaneous Recognition of Multiple Classes of Named Entities. CICLing 2010, LNCS 6008 http://www.public.asu.edu/~sjonnal3/home/papers/CIC-LING_60080224.pdf

Comment by gro...@chaoticlanguage.com, May 16, 2011

For earlier (the earliest?) work which models syntax and semantics as a kind of vector product over word vectors see:

Freeman R. J., Example-based Complexity--Syntax and Semantics as the Production of Ad-hoc Arrangements of Examples, Proceedings of the ANLP/NAACL 2000 Workshop on Syntactic and Semantic Complexity in Natural Language Processing Systems, pp. 47-50. ( http://acl.ldc.upenn.edu/W/W00/W00-0108.pdf)

Also:

http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PTXT&s1=Freeman.INNM.&s2="Large+corpus".TI.&OS=IN/Freeman+AND+TTL/

As Ted Dunning points out, non-compositional work is more common. For another opensource project clustering (non-compositional?) senses see also Ted Pedersen's Senseclusters: http://senseclusters.sourceforge.net/

Also Dekang Lin's Web demos:

http://webdocs.cs.ualberta.ca/~lindek/demos.htm


Sign in to add a comment
Powered by Google Project Hosting