My favorites | Sign in
Project Home Downloads Wiki Issues Source
READ-ONLY: This project has been archived. For more information see this post.
Search
for
  Advanced search   Search tips   Subscriptions
Issue 2: Performance and Quota tuning
1 person starred this issue and may be notified of changes. Back to list
 
Reported by Ben.Manes@gmail.com, Nov 22, 2011
Skimming the code, it appears that there are many facets that can be tuned. A few obvious ones are...

- Use AsyncMemcacheService to perform the I/O in parallel with the request. This would reduce user-facing latency.

- Cache Datastore operations (expirable in-memory, memcache) as CounterNames rarely change and metrics data can be lossy. Currently ProdEagle consumes 5-10% of the free quota.

- Reduce GC overhead by creating less garbage, thereby reducing user-facing latency. This is non-invasive, such as using Collections' singletonMap and emptyMap instead of HashMap or checking log level enablement prior to logging. While minor, its easy to see that ProdEagle is causing more frequent heap churn when comparing the application before/after integration.
Dec 30, 2011
Project Member #1 ed.go...@gmail.com
Hi Ben,

Thanks for your ideas. The original code was very quickly put together and I unfortunately haven't had much time since!

I've put a few updates in, including the AsyncMemcacheService (it wasn't available in Java when I started).

I'm afraid I'm not sure what you're referring to for CounterNames - I think that is cached most places it can be and datastore operations should only be for creating or editing the CounterNames (the values aren't stored anywhere in the datastore, as it is lossy like you mention).

Finally, as for the GC overhead - I could certainly have been more optimal. I've implemented some of your suggestions (Collections.emptyMap for example), but will probably have to do the rest sometime in the next couple of months (real work depending).

If you'd like to contribute though, drop me a line!

Ed
Status: Accepted
Owner: ed.go...@gmail.com
Cc: andri...@gmail.com
Labels: -Type-Defect Type-Enhancement
Dec 30, 2011
#2 Ben.Manes@gmail.com
Hi Ed,

I knew it was quick-and-dirty, no worries about that. Its really great to have the visibility into production, so thanks! I'm using this for a hobby project to learn Scala + GAE, so its all gravy.

For a pre-launch app, prodeagle consumes 7% of Datastore's free quota. This is on an idle day with only prodeagle operations (1.19K in 20hrs). Appstats information is below.

If I get a chance, I'll go through it again and send you a patch if I see anything worthy.

-Ben

POST /prodeagle/	 722	90	(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) ...
memcache.Get	362	
datastore_v3.Get	90	
datastore_v3.RunQuery	90	
memcache.BatchIncrement	90	
memcache.Delete	90	

(1) 2011-12-31 04:02:01.272 "POST /prodeagle/?last_time=1325304060" 200 real=98ms api=0ms overhead=0ms (8 RPCs)
memcache.Get	4
datastore_v3.Get	1
datastore_v3.RunQuery	1
memcache.Delete	1
memcache.BatchIncrement	1
 (2) 2011-12-31 04:01:00.625 "POST /prodeagle/?last_time=1325303940" 200 real=97ms api=0ms overhead=0ms (9 RPCs)
memcache.Get	5
datastore_v3.Get	1
datastore_v3.RunQuery	1
memcache.Delete	1
memcache.BatchIncrement	1
 (3) 2011-12-31 03:59:59.941 "POST /prodeagle/?last_time=1325303880" 200 real=92ms api=0ms overhead=0ms (8 RPCs)
memcache.Get	4
datastore_v3.Get	1
datastore_v3.RunQuery	1
memcache.Delete	1
memcache.BatchIncrement	1
Dec 31, 2011
#3 Ben.Manes@gmail.com
On every request, HarvestServlet's checks authentication with #isProdEagle(). This looks for the auth key in datastore in #getKeySecret. I don't see why this can't be cached locally and in memcache.

MemCacheManager#getMultipleCounters() may throw an exception by applying an invalid cast of a Map<K, V> to a Future<Map<K, V>>. This could instead return a custom Future (static singleton instance).
Dec 31, 2011
#4 Ben.Manes@gmail.com
Attached is a diff of the changes described above.
perf.diff
10.9 KB   View   Download
Jan 1, 2012
Project Member #5 ed.go...@gmail.com
Good point about the auth key. I forgot about that bit!

I've applied the diff and made a slight change to ensure that the auth key does always end up in Memcache (in case it expires out for whatever reason).

Let me know if that improves performance for you.
Status: Fixed
Jan 7, 2012
#6 Ben.Manes@gmail.com
I deployed v1.1.1 tonight (from v1.0) and the usage has improved by replacing the datastore.GET with a memcache.GET. I don't know where there is still a datastore.RunQuery being performed, though.

2012-01-08 07:32:25.050 "POST /prodeagle/?last_time=1326007860" 200 real=60ms api=0ms overhead=0ms (8 RPCs)
memcache.Get	5
memcache.Delete	1
datastore_v3.RunQuery	1
memcache.BatchIncrement	1


I am also receiving alerts from ProdEagle which indicate data loss. From the logs GAE reports,
2012-01-07 23:36:27.727 /prodeagle/?last_time=1326008100 200 78ms 0kb AppEngine-Google; (+https://code.google.com/appengine appid: s~prod-eagle)
0.1.0.40 - - [07/Jan/2012:23:36:27 -0800] "POST /prodeagle/?last_time=1326008100 HTTP/1.1" 200 328 - "AppEngine-Google; (+https://code.google.com/appengine appid: s~prod-eagle)" "drdncws.appspot.com" ms=78 cpu_ms=80 api_cpu_ms=22 cpm_usd=0.002298 instance=00c61b117c0a834c0af3747657abfa7b6894f6
W 2012-01-07 23:36:27.720
com.prodeagle.java.servlets.HarvestHandler wasDataLostSinceLastHarvest: ProdEagle counters lost before 1326008040000

ProdEagle warns via email...
Accuracy of your data for the past hour too low for 5 minutes!
   Counts for 'Accuracy' over the last 10 minutes:
    0!, 0!, 99.9916666667, 99.985, 0!, 0!, 0!, 0!, 0!, 0! (min allowed: 95).
Jan 8, 2012
#7 Ben.Manes@gmail.com
 com.google.appengine.tools.appstats.Recorder:290 makeAsyncCall()
  com.googlecode.objectify.cache.TriggerFutureHook:144 makeAsyncCall()
  com.google.apphosting.api.ApiProxy:184 makeAsyncCall()
  com.google.appengine.api.datastore.DatastoreApiHelper:59 makeAsyncCall()
  com.google.appengine.api.datastore.PreparedQueryImpl:144 runQuery()
  com.google.appengine.api.datastore.PreparedQueryImpl:70 asIterator()
  com.google.appengine.api.datastore.BasePreparedQuery:38 asIterator()
  com.prodeagle.java.counters.CounterNamesManager:96 allCounterNameShards()
  com.prodeagle.java.counters.CounterNamesManager:105 allCounterNames()
  com.prodeagle.java.servlets.HarvestHandler:133 createReport()
  com.prodeagle.java.servlets.HarvestHandler:75 doGet()
  com.prodeagle.java.servlets.HarvestHandler:88 doPost()
  javax.servlet.http.HttpServlet:637 service()
Jan 8, 2012
Project Member #8 ed.go...@gmail.com
I think I've figured out the issue with inaccurate data. Changing the memcache operations to Async means that on line 307 of the HarvestHandler the memcache keys might not be put until after the next check to see if they exist.

I've got a fix in mind that I'll put in this week.

As for the Datastore query... let me hunt for that and get back to you.
Status: Accepted
Jan 8, 2012
#9 Ben.Manes@gmail.com
Amusing, so my fault there. Perhaps the async only appeared to be a natural fit. If its too difficult to make it play nicely then feel free to revert it. 

The query is coming from CounterNamesManager#allCounterNameShards(), identified by Appstats in the stracktrace above. This can likely be memoized in memcached with the query invalidated when #addIfNew() or #delete() write to the DB.

On the plus side, the latency and GC were improved. Once the query have been cached, I think the DB quota usage will practically disappear, which makes prodeagle all the more awesome.
Jan 9, 2012
Project Member #10 ed.go...@gmail.com
I've left most of the async in, looks like its working fine on my dev site, but let me know what you find.

I'll try remove the other datastore operation later - probably tomorrow!
Jan 19, 2012
#11 Ben.Manes@gmail.com
Just want to confirm that the async fix has been working nicely on my site as well over the last week.

Powered by Google Project Hosting