oluolu


A query log mining tool with MapReduce

Overview

Oluolu is a open source query log mining tool which works on Hadoop. This tool provides resources to add new features to search engines. Concretely Oluolu supports automatic dictionary creation such as spelling correction, context queries or frequent query n-grams from query log data. The dictionaries are applied to search engines to add features such as 'did you mean' or 'related keyword suggestion' service in search engines.

News

2011-11-16 oluolu 0.2.1 released * Issue 5 (conf directory is missing) * Issue 7 (no output) 2011-05-11 oluolu 0.2.0 released * added new parameter -inputLanguage.

2010-10-12 oluolu 0.1.4rc2 released * added new parameter -usePrefixSelector to only output a query pair of the first query is a prefix of the second * added new parameter -postThreshold to specify the minimum query frequency that are loaded to the post processor in order to reduce memory usage 2010-07-07 oluolu 0.1.3 released * fixed character encoding bugs * reduced the amount of memory in the ngram post-processor stage

2010-06-09 oluolu 0.1.2 released * added a new parameter, '-showScore' to output the confidence socres for the elements in related query dictionary

2010-04-26 oluolu 0.1.1 released * fixed a bug (setting for the number of reducers is not activated)

2010-02-08 oluolu 0.1 released

Features

Spelling correction dictionary

Spelling correction dictionary consists of two row pair, one of them is query contains mistakes and the other is the query corrected the mistake. For example, Oluolu can extract a pair such as 'yaho' -> 'yahoo'. We can make use of the spelling correction dictionary dictionary consists of such pair building the 'did you mean' feature on search engines such as Solr or Fast ESP.

Context dictionary

Context dictionary dictionary also consist of query pairs as the spelling correction dictionary. One item of the pair is the query and the other is the query contains first query. For example, a related query dictionary can have the pair such as, 'yahoo' -> 'yahoo news'. This dictionary can be apply to the 'related keyword suggestion' service. Such services implemented in Bing or Google.

Usage

Begin with the Oluolu quick start page (QuickStart) which shows you to the installation and the tutorial with with small input files. For detailed usage, please visit the page, Usage.

To do

  • accept various input formats
  • provide server
  • make postprocessors mapreducerized

Project Information

Labels:
QueryLogMining SearchEngine Hadoop SpellCheck DidYouMean RelatedQuery LogMining MapReduce Synonym