What steps will reproduce the problem? 1. Deploy full-text-search walkaround 2. Import waves from google wave that have several participants and lots of text. (In my case 5 participants and one wave bot that doesn't function because I haven't implemented robot-apis. It has 44 entries in the wave and most are several paragraphs of text with multiple colors, font size and typeface) 3. Navigate through the wave with up and down arrows. Edit an entry. Add an entry. Etc.
What is the expected output? What do you see instead? I expected to be able to use walkaround as a free google app, but it is exceeding the maximum free "Datastore Read Ops" so my walkround instance is inaccessible for a lot of the day.
Only about 15 minutes of actual typing by one user over the span of an hour caused it to reach 0.05 Million Datastore Read Ops yesterday. Only about 5 minutes of typing by one user caused 0.02 Million Datastore Read Ops today.
What browser and browser version are you using? On what operating system? Firefox 11 on Windows 7
What URL does your browser show when the problem occurs? Did you compile walkaround on your machine, or are you using a public instance?
https://kwalkaround.appspot.com/
I compiled the instance on my Mac OSX 10.6.8 and deployed it to google's app engine, but the site is being accessed by other users from other machines through the website.
Please provide any additional information below.
This could be associated with Issue 7, but since Issue 7 has been labeled fixed, and those code changes were merged with the full-text-search branch, I thought this might be a different issue with the same symptoms. Also, maybe there are settings I could enable or change that would help, but I do not know what those would be.
Here is my Quota Detail information for Storage:
Storage Datastore Write Operations 0.00 Million Ops Okay Datastore Read Operations 0.02 Million Ops Okay Datastore Small Operations 0.00 Million Ops Okay Datastore API Calls 4,287 Okay Datastore Queries 1,306 Okay Blobstore API Calls 0 Okay Datastore Stored Data 0.01 GBytes Okay Blobstore Stored Data 0.01 GBytes Okay Data Sent to Datastore API 0.00 GBytes Okay Data Received from Datastore API 0.02 GBytes Okay Datastore Entity Fetch Ops 17,075 Okay Datastore Entity Put Ops 470 Okay Datastore Entity Delete Ops 73 Okay Datastore Index Write Ops 1,347 Okay Datastore Query Ops 1,311 Okay Datastore Key Fetch Ops 295 Okay Datastore Id Allocation Ops 1 Okay Number of Indexes 3%3% 6 of 200 Okay Prospective Search Subscriptions 0%0% 0 of 10,000 Okay Logs Stored Data 0.05 GBytes Okay
Comment #1
Posted on Apr 12, 2012 by Happy CatYou can use wavereactor.appspot.com instance to import waves - it has more quota. I don't think that Google ever promised to provide free, non limited Walkaround instance to anyone.
Comment #2
Posted on Apr 12, 2012 by Happy ElephantThanks, I'll check that out. I wasn't really saying google promised it would be free, though. I just thought that a couple of friends using a walkaround instance a few times a day shouldn't put it over google's quota maximum. Unless google's quota is insanely low and this is expected behavior?
Comment #3
Posted on Apr 12, 2012 by Helpful KangarooPer the rest of the issue text, the bigger concern is whether Walkaround is using excessive read operations along the same lines as Issue 7 , I believe. Very small changes causing thousands of reads would likely impact project adoption, even (or especially) for users paying for quota.
Comment #4
Posted on Apr 12, 2012 by Quick LionCan you go to /admin/appstats and check in the "RPC Stats" section which RPCs are being called too much (e.g. datastore_v3.Get), and which paths are calling them the most (e.g. POST /channel)? Maybe just expand the top few RPCs and paste the output here.
Comment #5
Posted on Apr 13, 2012 by Happy ElephantHere is the list. None of those counts seems like much compared to the number of Datastore read Ops the Admin Console is listing. When I googled this issue to see if I could locate what was causing all the reads I did find some forum posts indicating that the RCP counts might not tell you every read that is happening, which is unfortunate, but maybe it will still show where the excess is coming from. Here is one of those links: http://stackoverflow.com/questions/7807529/how-many-datastore-reads-consume-each-fetch-count-and-query-operations
memcache.Get 1366 POST /channel 1326 /wave 21 /photos 7 / 4 /contacts 4 /inbox 4 memcache.Set 277 POST /channel 263 /wave 11 / 1 /photos 1 POST /_ah/queue/deferred 1 datastore_v3.RunQuery 151 POST /channel 138 /wave 13 datastore_v3.BeginTransaction 82 POST /channel 69 /wave 10 / 1 /contacts 1 /photos 1 datastore_v3.Rollback 81 POST /channel 69 /wave 10 / 1 /photos 1 datastore_v3.Get 74 POST /channel 69 /wave 3 / 1 /photos 1 datastore_v3.Next 17 /wave 17
Comment #6
Posted on Apr 13, 2012 by Quick LionThe appstats data is only stored in memcache, so any part or all of it can indeed go missing at any point. But there's a good chance that it will point us in the right direction anyway. Can you try your five minutes of typing again that cause 20,000 datastore ops, and tell us what appstats shows immediately after that?
Comment #7
Posted on Apr 13, 2012 by Happy ElephantOkay, I replicated my issue. I opened a new tab in Chrome on Windows 7 and logged into the admin console to see the datastore quota prior to starting typing in my wave:
Storage Datastore Write Operations 0.01 Million Ops Okay Datastore Read Operations 0.11 Million Ops Okay Datastore Small Operations 0.00 Million Ops Okay Datastore API Calls 13,870 Okay Datastore Queries 4,564 Okay Blobstore API Calls 0 Okay Datastore Stored Data 0.01 GBytes Okay Blobstore Stored Data 0.01 GBytes Okay Data Sent to Datastore API 0.00 GBytes Okay Data Received from Datastore API 0.15 GBytes Okay Datastore Entity Fetch Ops 105,798 Okay Datastore Entity Put Ops 1,384 Okay Datastore Entity Delete Ops 73 Okay Datastore Index Write Ops 3,771 Okay Datastore Query Ops 4,578 Okay Datastore Key Fetch Ops 895 Okay Datastore Id Allocation Ops 1 Okay Number of Indexes 3% 6 of 200 Okay Prospective Search Subscriptions 0% 0 of 10,000 Okay Logs Stored Data 0.05 GBytes Okay
Here is the datastore quota AFTER doing 5 minutes of typing in my wave: Storage Datastore Write Operations 0.01 Million Ops Okay Datastore Read Operations 0.13 Million Ops Okay Datastore Small Operations 0.00 Million Ops Okay Datastore API Calls 17,106 Okay Datastore Queries 5,490 Okay Blobstore API Calls 0 Okay Datastore Stored Data 0.01 GBytes Okay Blobstore Stored Data 0.01 GBytes Okay Data Sent to Datastore API 0.00 GBytes Okay Data Received from Datastore API 0.17 GBytes Okay Datastore Entity Fetch Ops 122,766 Okay Datastore Entity Put Ops 1,768 Okay Datastore Entity Delete Ops 73 Okay Datastore Index Write Ops 4,855 Okay Datastore Query Ops 5,461 Okay Datastore Key Fetch Ops 1,276 Okay Datastore Id Allocation Ops 1 Okay Number of Indexes 3% 6 of 200 Okay Prospective Search Subscriptions 0% 0 of 10,000 Okay Logs Stored Data 0.05 GBytes Okay
You can see the "Datastore Read Ops" went up by 0.02 Million and the "Datastore Entity Fetch Ops" went up by about 17k.
And here is the RPC stat info you wanted for when this happens:
RPC Count memcache.Get 1158 POST /submitdelta 894 POST /channel 157 POST /taskqueue/postcommit 90 /wave 11 /contacts 2 /photos 2 / 1 /inbox 1 datastore_v3.RunQuery 594 POST /submitdelta 290 POST /taskqueue/postcommit 265 POST /channel 32 /wave 7 datastore_v3.BeginTransaction 426 POST /submitdelta 232 POST /taskqueue/postcommit 156 POST /channel 31 /wave 5 /contacts 2 memcache.Set 342 POST /submitdelta 255 POST /channel 47 POST /taskqueue/postcommit 33 /wave 5 /contacts 1 POST /_ah/queue/deferred 1 channel.SendChannelMessage 299 POST /submitdelta 299 datastore_v3.Commit 260 POST /submitdelta 226 POST /taskqueue/postcommit 33 /contacts 1 datastore_v3.Put 260 POST /submitdelta 226 POST /taskqueue/postcommit 33 /contacts 1 datastore_v3.Next 232 POST /submitdelta 212 POST /taskqueue/postcommit 20 datastore_v3.Rollback 166 POST /taskqueue/postcommit 123 POST /channel 31 POST /submitdelta 6 /wave 5 /contacts 1 search.IndexDocument 128 POST /taskqueue/postcommit 128 datastore_v3.Get 84 POST /taskqueue/postcommit 47 POST /submitdelta 34 /contacts 1 /wave 1 POST /channel 1 logservice.Flush 38 taskqueue.BulkAdd 25 urlfetch.Fetch 8 channel.CreateChannel 1 memcache.Delete 1 search.Search 1
Comment #8
Posted on Apr 13, 2012 by Happy ElephantComment deleted
Comment #9
Posted on Apr 13, 2012 by Happy ElephantOh, also, what I typed in the wave ended up being 881 characters, not including any time I backspaced, if that helps.
Comment #10
Posted on Apr 15, 2012 by Quick LionLooking at the numbers:
RPCs related to entity fetch ops:
datastore_v3.RunQuery 594 POST /submitdelta 290 POST /taskqueue/postcommit 265 POST /channel 32 /wave 7
datastore_v3.Next 232 POST /submitdelta 212 POST /taskqueue/postcommit 20
datastore_v3.Get 84 POST /taskqueue/postcommit 47 POST /submitdelta 34 /contacts 1 /wave 1 POST /channel 1
other datastore RPCs:
datastore_v3.BeginTransaction 426 POST /submitdelta 232 POST /taskqueue/postcommit 156 POST /channel 31 /wave 5 /contacts 2
datastore_v3.Commit 260 POST /submitdelta 226 POST /taskqueue/postcommit 33 /contacts 1
datastore_v3.Put 260 POST /submitdelta 226 POST /taskqueue/postcommit 33 /contacts 1
datastore_v3.Rollback 166 POST /taskqueue/postcommit 123 POST /channel 31 POST /submitdelta 6 /wave 5 /contacts 1
non-datastore RPCs:
channel.SendChannelMessage 299 POST /submitdelta 299
memcache.Get 1158 POST /submitdelta 894 POST /channel 157 POST /taskqueue/postcommit 90 /wave 11 /contacts 2 /photos 2 / 1 /inbox 1
memcache.Set 342 POST /submitdelta 255 POST /channel 47 POST /taskqueue/postcommit 33 /wave 5 /contacts 1 POST /_ah/queue/deferred 1
search.IndexDocument 128 POST /taskqueue/postcommit 128
logservice.Flush 38 taskqueue.BulkAdd 25 urlfetch.Fetch 8 channel.CreateChannel 1 memcache.Delete 1 search.Search 1
I realize now that I should have asked you to go to /admin and clear memcache first to reset these counters to zero before the experiment.
But, given the above numbers, it looks like most of the read ops are in the reconstruction of the wavelet state during submitdelta and post-commit. You could edit webinf/flags.json and increase post_commit_action_interval_millis to e.g. 300000 to reduce the number of datastore ops consumed by post-commit. (This will also make waves take five minutes to show up in new participants' inboxes -- we should probably short-circuit AddParticipant similarly to how we treat RemoveParticipant.)
The long-term solution to this is probably to cache the current state of each wavelet in memcache to avoid the datastore accesses (this can also be used to make waves load faster). It's not hard, but I'm not sure when I'll get around to it. Export-related bugs are a higher priority at the moment.
Another solution would be to make affinity work with dynamic backends so that MutationLog.StateCache is more effective, increase post_commit_action_interval_millis, and short-circuit AddParticipant. Caching slob states seems more useful overall though.
Status: Accepted
Labels:
Type-Defect
Priority-Medium