My favorites | Sign in
Google
                
New issue | Search
for
| Advanced search | Search tips
Issue 901: queries return different numbers of results in production server, due to indices out of sync
51 people starred this issue and may be notified of changes. Back to list
Status:  Accepted
Owner:  ----
Type-Defect
Priority-Medium
log-1516951
Component-Datastore


Sign in to add a comment
 
Reported by siva.velusamy, Dec 03, 2008
What steps will reproduce the problem?
1. Go to data viewer in app id = envy
2. Issue GQL: "SELECT * FROM Salary_b WHERE company = 'xilinx' ORDER BY
title ASC". This returns 23 results
3. Issue GQL: "SELECT * FROM Salary_b WHERE company = 'xilinx'". This
returns 42 results.

The issue exists not just in the data viewer, but in the production app
server also.

Does this problem exist with both the SDK app server and our production app
servers?

Only in the production server & data viewer. The development server is fine.

What is the expected output? What do you see instead?

Order by should not change the # of results.


Comment 1 by ma...@google.com, Dec 03, 2008
(No comment was entered for this change.)
Status: Accepted
Labels: log-1516951 Component-Datastore
Comment 2 by joshlivni, Jan 02, 2009
We had a similar problem, and turns out it was related to a corrupted index.  Out of 
curiosity:  If you rebuild the relevant indexes on Salary_b, does this issue go away?

For us it did, but unfortunately then it came back.  It seems a very small % of 
entities don't get indexed appropriately for some reason or another... 

So if rebuilding fixes your problem, perhaps this could be isolated a bit and 
described as bug that causes an index corruption, which has a variety of symptoms 
including missing results in certain ORDER BY queries.
Comment 3 by siva.velusamy, Jan 02, 2009
My app did uncover a host of datastore issues with indexing, so it is quite possible
that something was not fixed properly.

From the response from Google here: 

http://groups.google.com/group/google-appengine/msg/41454b797274740d?dmode=source

I'm inclined to believe that they have identified the source of the issue.
Comment 4 by rwilliamz, Jan 15, 2009
Beginning to see this frequently in my app too.  The workaround for calling put() on
the missing entities is making them appear again.
Comment 5 by philip-g...@gladstonefamily.net, Mar 15, 2009
I'm seeing this too -- only I thought it was associated with auto_now_add=true on a
DateTimeProperty. I managed to fix it by fetching and putting all the objects in my
database (which took 1.5 hours of CPU time). The trouble is that there is no obvious
way to tell whether objects have gotten lost from the indexes. [And, of course, the
fetching that I did assumed that the single property index that I used was complete]

Having a cron job (or similar) that re-puts all objects in the datastore every week
(day?) seems like the wrong approach as a workaround. 
Comment 6 by bortuzar, Apr 08, 2009
Im experiencing this issue too. Its been more than 3 months and this huge bug is
still out there. I did a bulkload of around 700.000 rows, and very frequently rows
are not appearing under a date range or 'order by' GQL query. It will be very hard to
re-put all this rows again.
Comment 7 by ryanb+ap...@google.com, Apr 16, 2009
sorry for the trouble, all. we spent a long time triaging and investigating this, and
we're definitely still working on it. we've made a number of changes that have helped
a little, but the main change that will have the biggest impact is still underway. we
hope to have it done soon though!

thanks for your patience...
Summary: queries return different numbers of results in production server, due to indices out of sync
Comment 8 by juraj.vi...@gmail.com, Jul 31, 2009
Has this problem been solved yet? I'm asking because by the last comment (from April)
it seems that the report was forgotten since then (and the bug perhaps fixed, since
it is a critical one).
Comment 9 by jcrocholl, Dec 03, 2009
I'm still seeing corrupt indexes today. Up to 50% of items are missing from the
result set on descending __key__ indexes that I created on Dec 1, 2009. My dataset is
large (over 2 million items, using 15 GB including metadata). I made a simple test
page to demonstrate the problem: http://scoretool.appspot.com/dns/test/
Sign in to add a comment