|
ZoieMergePolicy
ZoieMergePolicyIntroductionZoie uses a customized merge policy implementation. This merge policy is designed to improve realtime indexing performance by carefully managing the number of segments and their sizes. DetailsIssues with Segment MergeLucene creates small index segments when new documents are indexed. Lucene merges small segments to a larger segments. MergePolicy decides when to do it. Lucene has two implemetations of MergePolicy: LogByteSizeMergePolicy and LogDocMergePolicy. They try to merge segments into levels of exponentially increasing size in byte and the number of documents, respectively. This works nicely in off-line indexing environment. However, in a realtime indexing environment, there are some problems:
In a realtime environment it is better to have more frequent smaller merges than one big merge. But we need to keep the number of segments under control to maintain a good search performance. ZoieMergePolicy is created to address this requirement. Note that LogMergePolicy has a method to suppress merges of segments bigger than a certain size. But it is required to set up several parameters carefully to get desired behavior. Large Segments vs Small SegmentsZoieMergePolicy has a concept of large segments and small segments. The number of segments are controlled by NumLargeSegments (L) and MaxSmallSegments (S). Zoie try to keep the size of first L segments close and also to keep the number of small segments at most S. Small segments are managed in the same way as LogByteSizeMergePolicy. When the total size of the small segments reaches the average size of large segments, small segments are promoted to a large segment. ZoieMergePolicy finds the most balanced way of merging segments using Viterbi algorithm. Segment Size NormalizationThe segment size is measured by the total byte size of the files that constitutes the segment. It includes index data of deleted documents. In an extreme case, it is possible that a large segment does not have any valid document. Such a segment may not be selected for a merge simply because it is large if the selection is solely based on the size. Thus, ZoieMergePolicy normalizes the size by the ratio of deleted documents. Zoie can estimate a segment size for merge more accurately. Partial ExpungeAlthough the balanced merge described above increases the change of deletes being purged, an application may want to do it more often for search performance. ZoieMergePolicy allows an application to set the PartialExpunge flag. It finds one segment with most deletes and cleans it up in addition to a regular merge. PartialExpunge can be done on a schedule, it is recommended to use this flag when load is light, for example, every day at 1:00am. PerformanceThe following chart shows the benefit of ZoieMergePolicy. We measured the query response time during indexing. Zoie indexed 154MB of data (about 16000 documents, replicated up to 1M documents), then it randomly chose documents to update for 10 hours. LogMergePolicy suffered a huge performance hit toward the end. On the other hand, ZoieMergePolicy is relatively stable though there are small but more frequent spikes.
ZoieMergePolicy tends to reclaim deleted documents more often compared to LogMergePolicy. This helps the base line of query response time. The response time is 10~20% faster with ZoieMergePolicy. |
Sign in to add a comment