
bobo-browse - issue #24
MatchAllDocsQuery return wrong answers after docs deletion and insertion again.
What steps will reproduce the problem? 1. index for example 3 docs(uid=1 field1=..., uid=2 field1=..., uid=3 field1=...); 2. flush to disk index; 3. delete them; 4. flush to disk index; 5. insert them again 6. flush to disk index; 7. search with MatchAllDocsQuery;
What is the expected output? What do you see instead?
- expected: all 3 docs will be returned;
- what i see: none of them are returned;
What version of the product are you using? On what operating system? Version: trunk OS: Debian
Please provide any additional information below.
This is caused by bugs in FastMatchAllDocsQuery.
After steps above, you will get 3 index file on disk: _0.cfs, _0_1.del and _1.cfs
When FastMatchAllDocsQuery instance created by BoboIndexReader.getFastMatchAllDocsQuery, you will get deletedDocs=[0, 1, 2] and maxDoc=6.
And then, FastMatchAllDocsQuery.FastMatchAllDocsWeight.scorer will be called twice for _0.cfs and _1.cfs.
The first call is good, but the second call: _deletedDocs = [0, 1, 2] and _deletedIndex = 0 too, but _deletedDocs should be null OR _deletedIndex be 3 because any of 3 docs in _1.cfs is not deleted.
The attached patch is my workaround.
- FastMatchAllDocsQuery.diff 4.46KB
Comment #1
Posted on Sep 18, 2009 by Grumpy HippoThanks Lei!
Comment #2
Posted on Sep 18, 2009 by Grumpy HippoI wrote this test and it passes with the current FastMatchAllDocs impl:
public void testFastMatchAllDocs() throws Exception{ RAMDirectory idxDir = new RAMDirectory(); Document doc; Field f; IndexWriter writer = new IndexWriter(idxDir,new StandardAnalyzer(),MaxFieldLength.UNLIMITED); doc = new Document(); f = new Field("id","1",Store.YES,Index.NOT_ANALYZED_NO_NORMS); doc.add(f); writer.addDocument(doc); doc = new Document(); f = new Field("id","2",Store.YES,Index.NOT_ANALYZED_NO_NORMS); doc.add(f); writer.addDocument(doc); doc = new Document(); f = new Field("id","3",Store.YES,Index.NOT_ANALYZED_NO_NORMS); doc.add(f); writer.addDocument(doc); writer.commit();
writer.deleteDocuments(new Term("id","1"));
writer.deleteDocuments(new Term("id","2"));
writer.deleteDocuments(new Term("id","3"));
writer.commit();
BoboIndexReader reader = BoboIndexReader.getInstance(IndexReader.open(idxDir));
IndexSearcher searcher = new IndexSearcher(reader);
TopDocs topDocs = searcher.search(reader.getFastMatchAllDocsQuery(), 100);
assertEquals(0, topDocs.totalHits);
reader.close();
doc = new Document();
f = new Field("id","1",Store.YES,Index.NOT_ANALYZED_NO_NORMS);
doc.add(f);
writer.addDocument(doc);
doc = new Document();
f = new Field("id","2",Store.YES,Index.NOT_ANALYZED_NO_NORMS);
doc.add(f);
writer.addDocument(doc);
doc = new Document();
f = new Field("id","3",Store.YES,Index.NOT_ANALYZED_NO_NORMS);
doc.add(f);
writer.addDocument(doc);
writer.commit();
reader = BoboIndexReader.getInstance(IndexReader.open(idxDir));
searcher = new IndexSearcher(reader);
topDocs = searcher.search(reader.getFastMatchAllDocsQuery(), 100);
assertEquals(3, topDocs.totalHits);
reader.close();
}
After changing writer.commit -> writer.flush (a deprecated method) It does fail.
But it fails even after the patch is applied.
Do you have a unit test that reproduces the problem?
Thanks
Comment #3
Posted on Sep 18, 2009 by Grumpy BirdI do not have a unit test here.
The problem only occurs when there are more the one index files.
I index these docs by zoieSystem consumer, you can set batch size to 3, so after 3 docs consumed, you will see the _0.cfs file on disk. and after the 3 deletion, the _0_1.del will be created. after 3 consume again, the _1.cfs will be there.
so there are two real index file there, _0.cfs and _1.cfs, and you will see the problem by issue a MatchAllDocsQuery.
Comment #4
Posted on Sep 18, 2009 by Grumpy BirdAnd, this is my test index files.
query shanghai or beijing, will give you the right answer,
but query for : will return no result.
- idx.tar.gz 1.34KB
Comment #5
Posted on Sep 18, 2009 by Grumpy Birdsorry, query for contents:shanghai or contents:china
Comment #6
Posted on Sep 18, 2009 by Grumpy HippoIs this problem with MatchAllDocsQuery or FastMatchAllDocsQuery?
Can you build the index with lucene 2.4 instead?
Lucene 2.9 had api changes that broke bobo.
Comment #7
Posted on Sep 18, 2009 by Grumpy Birdwith FastMatchAllDocsQuery. and don't have time build that in 2.4 now, i have to turn off my pc and for my train now.
I will rebuild it after iam back.
Comment #8
Posted on Sep 21, 2009 by Grumpy BirdIndexes building with lucene 2.4.
- idx.tar.gz 1.2KB
Comment #9
Posted on Sep 21, 2009 by Grumpy HippoAfter Lei's tests, we have determined this is related to Lucene 2.9 compatibility. The above test code (with RAMDirectory changed to FSDirectory) passes with Lucene 2.4 but fails with 2.9. Where as using MatchAllDocs pass always.
Will leave this bug to be resolved with Lucene 2.9 upgrade.
Comment #10
Posted on Oct 9, 2009 by Grumpy BirdFix a stupid bug in my previous patch.
- FastMatchAllDocsQuery.diff 4.37KB
Comment #11
Posted on Oct 10, 2009 by Grumpy Birdpatch for BR_DEV_LUCENE_2.9 branch
Comment #12
Posted on Oct 24, 2009 by Grumpy HippoThanks Lei for the patches! FastMatchAllDocsQuery was created because Lucene's default MatchAllDocsQuery had a bottle neck on the delete check.
That was fixed in Lucene 2.9. So the default MatchAllDocsQuery should be now used instead.
This class is now removed and getFastMatchAllDocsQuery is now deprecated.
Status: Fixed
Labels:
Type-Defect
Priority-Medium