You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Original issue 103 created by yangzhuodog1982 on 2012-07-15T13:36:35.000Z:
The following code is from Seek() function in block.cc. As the comments says, it performs a stl::lower_bound operation in the restart array. However, I believe the implementation is flawed.
// Binary search in restart array to find the first restart point
// with a key >= target
uint32_t left = 0;
uint32_t right = num_restarts_ - 1;
while (left < right) {
uint32_t mid = (left + right + 1) / 2;
......
Slice mid_key(key_ptr, non_shared);
if (Compare(mid_key, target) < 0) {
// Key at "mid" is smaller than "target". Therefore all
// blocks before "mid" are uninteresting.
left = mid;
} else {
// Key at "mid" is >= "target". Therefore all blocks at or
// after "mid" are uninteresting.
right = mid - 1;
}
}
In some cases, it may find a previous restart point instead of the correct one. Although the final result is correct, it involves unnecessary iteration and confusing. I think the following code is better.
// Binary search in restart array to find the first restart point
// with a key >= target
uint32_t left = 0;
uint32_t right = num_restarts_;
while (left < right) {
uint32_t mid = (left + right) / 2;
......
Slice mid_key(key_ptr, non_shared);
if (Compare(mid_key, target) < 0) {
// Key at "mid" is smaller than "target". Therefore all
// blocks before "mid" are uninteresting.
left = mid + 1;
} else {
// Key at "mid" is >= "target". Therefore all blocks at or
// after "mid" are uninteresting.
right = mid;
}
}
The text was updated successfully, but these errors were encountered:
Comment #1 originally posted by sanjay@google.com on 2012-07-16T17:04:00.000Z:
This code needs to implement something other than
std::lower_bound. I suspect there are misleading
comments in the code that are leading you astray.
In particular, the block stores multiple keys
between each restart point. And what the binary
search needs to do is find the largest restart
point with a key < target so we can start
decoding from there and find any occurrence of
target that might occur between restart points.
Consider the following example with target==250.
restart point 0: key == 100
... some keys in range [100..199] here ...
restart point 1: key == 200
... some keys in range [200..299] here ...
restart point 2: key == 300
... some keys in range [300..inf] here ...
In this example, we would want the binary search
to find restart point 1 (key == 200) and then
sequentially decode from there.
Now consider your proposed code.
First iteration:
left = 0
right = 3
mid = (left + right) / 2 == 1
Compare(mid_key, target) == Compare(200, 250)
The comparison returns < 0, so we do:
left = mid + 1 = 2
We have skipped past the correct result of binary
search (restart point 1). So we will end up starting
the decoding process too late in the block.
Sorry about the misleading comment. I will change
the following comment:
// Binary search in restart array to find the first restart point
// with a key >= target
to:
// Binary search in restart array to find the last restart point
// with a key < target
Original issue 103 created by yangzhuodog1982 on 2012-07-15T13:36:35.000Z:
The following code is from Seek() function in block.cc. As the comments says, it performs a stl::lower_bound operation in the restart array. However, I believe the implementation is flawed.
In some cases, it may find a previous restart point instead of the correct one. Although the final result is correct, it involves unnecessary iteration and confusing. I think the following code is better.
The text was updated successfully, but these errors were encountered: