This project addresses the following problem: Given a dataset of sparse vector data, find all similar vector pairs according to a similarity function such as cosine distance and a given similarity score threshold. (This problem is also known as the "similarity join.")
The package consists of a bare-bones implementation of the "All-Pairs-Binary" algorithm described in the following paper:
R. J. Bayardo, Yiming Ma, Ramakrishnan Srikant. Scaling Up All-Pairs Similarity Search. In Proc. of the 16th Int'l Conf. on World Wide Web, 131-140, 2007. (download from: http://www.bayardo.org/ps/www2007.pdf)