
oluolu
Overview
Oluolu is a open source query log mining tool which works on Hadoop. This tool provides resources to add new features to search engines. Concretely Oluolu supports automatic dictionary creation such as spelling correction, context queries or frequent query n-grams from query log data. The dictionaries are applied to search engines to add features such as 'did you mean' or 'related keyword suggestion' service in search engines.
News
2011-11-16 oluolu 0.2.1 released * Issue 5 (conf directory is missing) * Issue 7 (no output) 2011-05-11 oluolu 0.2.0 released * added new parameter -inputLanguage.
2010-10-12 oluolu 0.1.4rc2 released * added new parameter -usePrefixSelector to only output a query pair of the first query is a prefix of the second * added new parameter -postThreshold to specify the minimum query frequency that are loaded to the post processor in order to reduce memory usage 2010-07-07 oluolu 0.1.3 released * fixed character encoding bugs * reduced the amount of memory in the ngram post-processor stage
2010-06-09 oluolu 0.1.2 released * added a new parameter, '-showScore' to output the confidence socres for the elements in related query dictionary
2010-04-26 oluolu 0.1.1 released * fixed a bug (setting for the number of reducers is not activated)
2010-02-08 oluolu 0.1 released
Features
Spelling correction dictionary
Spelling correction dictionary consists of two row pair, one of them is query contains mistakes and the other is the query corrected the mistake. For example, Oluolu can extract a pair such as 'yaho' -> 'yahoo'. We can make use of the spelling correction dictionary dictionary consists of such pair building the 'did you mean' feature on search engines such as Solr or Fast ESP.
Context dictionary
Context dictionary dictionary also consist of query pairs as the spelling correction dictionary. One item of the pair is the query and the other is the query contains first query. For example, a related query dictionary can have the pair such as, 'yahoo' -> 'yahoo news'. This dictionary can be apply to the 'related keyword suggestion' service. Such services implemented in Bing or Google.
Usage
Begin with the Oluolu quick start page (QuickStart) which shows you to the installation and the tutorial with with small input files. For detailed usage, please visit the page, Usage.
To do
- accept various input formats
- provide server
- make postprocessors mapreducerized
Project Information
- License: Apache License 2.0
- 16 stars
- svn-based source control
Labels:
QueryLogMining
SearchEngine
Hadoop
SpellCheck
DidYouMean
RelatedQuery
LogMining
MapReduce
Synonym