Issue 1441: Optimize commit message search predicate
Status:  New
Owner: ----
Reported by jhans...@myyearbook.com, Jun 13, 2012
Currently the "message:FOO" predicate has to obtain a RevWalk instance in the Git repository, and compares MessageRevFilter#include() for the message.  That has the downside of being incredibly slow for searching closed changes (or even for open changes, just particularly when there are *a lot* of changes to scan through).  For example:

 gerrit> query message:FOO 
 runTimeMilliseconds: 157

 gerrit> query message:FOO status:open
 runTimeMilliseconds: 87

 gerrit> query message:FOO status:merged
 runTimeMilliseconds: 7315

The "status:merged" search takes almost 50x longer than a search with no status predicate.

However, the ChangeData instance already has access to the commit message from the database, so the comparison here should be more in line with the speed of another predicate that accesses the database (for example, TopicPredicate):

 gerrit> query topic:FOO
 runTimeMilliseconds: 8

 gerrit> query topic:FOO status:open
 runTimeMilliseconds: 7

 gerrit> query topic:FOO status:merged
 runTimeMilliseconds: 121

A non-Git MessagePredicate that is able to search ChangeData#commitMessage() instead would be many times faster, wouldn't it?  Even if we need to support both, having the option (e.g., "message:*" searches Git, "subject:*" searches ChangeData#commitMessage()) would at least allow it to be faster in most cases.
Jun 13, 2012
#1 sop@google.com
The commit message isn't stored in the database. Only the first line, aka the change subject, is stored in the database. The rest of the message is only available from Git and thus cannot be quickly scanned for from the SQL DB.

Long term the right approach is to index all of the data using Lucene, or another full text type indexing system, and convert all query operators over to searching fields in that full-text style inverted index.
Jun 20, 2012
#2 jhans...@myyearbook.com
Shawn: thanks for the info.  In that case, a "subject:*" predicate would probably be helpful if searching just the subject is sufficient for the query.  In my scenario, the text we are looking for is generally in the subject line, so that would still be a win.