Issue 2822: Lucene search is not working with keyword as part of a word in Gerrit 2.9
Status:  Released
Owner: ----
Closed:  Sep 2014
Project Member Reported by bassem.rabil, Aug 11, 2014
************************************************************
***** NOTE: THIS BUG TRACKER IS FOR GERRIT CODE REVIEW *****
***** DO NOT SUBMIT BUGS FOR CHROME, ANDROID, INTERNAL *****
***** ISSUES WITH YOUR COMPANY'S GERRIT SETUP, ETC.    *****
***** THOSE ISSUE BELONG IN DIFFERENT ISSUE TRACKERS!  *****
************************************************************

Affected Version: Gerrit 2.9

What steps will reproduce the problem?
1. Create a new change #1 with message KEYWORD1_KEYWORD2_msg in test-project1


2. Try to search using web UI search box using substring  "KEYWORD1_KEYWORD2" :
project:test-project1 message:KEYWORD1_KEYWORD2
OR project:test-project1 message:"KEYWORD1_KEYWORD2"
OR "project:test-project1 message:{KEYWORD1_KEYWORD2}

This returns no changes using Gerrit 2.9. For Gerrit 2.7 with no secondary index, the search was resulting in change #1. If the whole string is passed to search "KEYWORD1_KEYWORD2_msg", change #1 is showing up correctly, however we have many users with queries based on substring keywords and these were broken since Gerrit 2.8.

What is the expected output? What do you see instead?
The expected search result should show: change #1 with substring match, instead no changes are showing as search results


Please provide any additional information below.
Aug 11, 2014
#1 dborowitz@google.com
I think we just need to tweak the Lucene analyzer.

Does message:"KEYWORD1 KEYWORD2" work?

Do messge:KEYWORD1, message:KEYWORD2, and message:msg work independently?
Aug 11, 2014
Project Member #2 bassem.rabil
message:"KEYWORD1 KEYWORD2" does not work for us. To retrieve a change with message "msg_KEY1_KEY2_msg", you need to search for the exact full word "msg_KEY1_KEY2_msg". Neither message:Key1 nor message:Key2 works independently.
Aug 11, 2014
#3 dborowitz@google.com
And this is specifically with '_', not with, say, '-' or ',' or '.'?
Aug 12, 2014
Project Member #4 bassem.rabil
I tried this and it looks like it is not specific to "_" or any special character, here what I tried to add a change with message including "MH123456":
- Searching "message:MH12345" is not retrieving that change
- Searching "message:H123456" is not retrieving that change either
- Only searching using the whole word retrieves the change "message:MH123456"
Aug 12, 2014
#5 dborowitz@google.com
The behavior you describe of full-text search operating only on whole words as opposed to arbitrary substrings is working as intended. This is a slight change from pre-Lucene, sorry.

I'm willing to revisit what we consider a "whole word", which is why I mentioned tweaking the analyzer above, and why I had specific questions about specific delimiters.
Aug 14, 2014
Project Member #6 bassem.rabil
After further investigation, we performed the following tests:

We added changes with the following changes:
#1: This is a commit message with the"following"link
#2: This is a commit message with the(following)link
#3: This is a commit message with the[following]link
#4: This is a commit message with the.following.link
#5: This is a commit message with the,following,link
#6: This is a commit message with the_following_link
#7: This is a commit message with the-following-links

If we search using "message:following", all changes are reported except those with "." and "_" special characters, i.e. changes #6 and #4. I think the analyzer can be tweaked to handle "_" the same way it handles "-".
Aug 14, 2014
#7 dborowitz@google.com
Thanks for the detailed tests.
Aug 21, 2014
Project Member #8 huga...@gmail.com
https://gerrit-review.googlesource.com/#/c/59371/
Status: ChangeUnderReview
Sep 9, 2014
Project Member #9 huga...@gmail.com
(No comment was entered for this change.)
Status: Submitted
Labels: FixedIn-2.11
Nov 19, 2014
#11 AgentFri...@gmail.com
Wait, why the switch to whole-word only matches?  Without matching substrings, most searches I want to do end up missing a lot of records.  I first have to realize they are missing, and then get really creative with my search.

I thought that regex could solve the issue, and it works as long as the match is in the first line.  Searching for "^.*3825.*" will miss occurrences of 3825 anywhere after line 1 of the comment, presumably because "." wildcard is not matching newlines.  Is there any way to get around that?

I think partial word searching is important enough that the docs should 1) make clear that searching is only whole-word, and 2) suggests best-known ways to work around this.


Dec 28, 2014
Project Member #12 david.pu...@sonymobile.com
 Issue 2808  has been merged into this issue.
Apr 16, 2015
Project Member #13 david.pu...@sonymobile.com
(No comment was entered for this change.)
Status: Released