My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members

Compute syntactical similarity of the text. Java program that compares two files and return - in percentage - how similar they are.

So for example: java -jar ss.jar c:/tmp/a.txt c:/tmp/b.txt Output would be: Similarity is 89.60159%
Some texts are too similar to each other, like almost! duplicated news articles for example. The difference could be that in the middle of the text is different advertisement or just headline is slightly modified. This simple program tries to compute how much (in percentage) are two texts similar. Note: This is syntactical similarity, not lexical one. It means that only structure of words and phrases is taken into account not their meaning. This project is used as part of http://www.opfine.com/ online financial news text analyser to simplify and reduce resources load.
Powered by Google Project Hosting