Skip to content
This repository has been archived by the owner on Dec 25, 2023. It is now read-only.

Auto-convert queries to keys_only=True queries that pull entities from cache #118

Closed
GoogleCodeExporter opened this issue Jun 10, 2015 · 12 comments

Comments

@GoogleCodeExporter
Copy link

keys_only=True queries count as "small datastore ops".

Under certain scenarios, many of the entities returned from a query may already 
be in cache - in particular memcache.

It would be great if the NDB framework could automatically convert a query to 
keys_only=True and then a get_multi() using the returned keys so that the 
entities would be returned from the caches (of course, only if present).

For queries that return frequently accessed entities, this could end up being a 
significant cost (and performance?) savings and could all be provided under the 
hood by NDB.

The API might be as simple as:

  Entity.query(use_cached_entities=True)

Of course, the following would raise some kind of exception because it's 
inconsistent:

  Entity.query(use_cached_entities=True, keys_only=False)

Original issue reported on code.google.com by jcoll...@vendasta.com on 21 Dec 2011 at 4:13

@GoogleCodeExporter
Copy link
Author

I'm reluctant to add yet another keyword, since it looks like you could do it 
in a few lines of code yourself: e.g. (if you don't care about async smartness)

  db.get_multi(q.fetch(keys_only=True))

of (if you don't want to go to the datastore when it's not in memcache -- I 
couldn't tell if that's what you meant):

  db.get_multi(q.fetch(keys_only=True), use_datastore=False)

If you do care about async stuff you could write a little callback like this:

  def cb(key):
    ent = key.get(use_datastore=False)  # or True
    if ent is not None:
      ...do what you want with ent...

and pass this to map, as follows:

  yield q.map(cb, keys_only=True)

Original comment by guido@google.com on 22 Dec 2011 at 5:20

  • Added labels: Priority-Low, Type-Enhancement
  • Removed labels: Priority-Medium, Type-Defect

@GoogleCodeExporter
Copy link
Author

I hear you about "another keyword" - there certainly are a lot of them!

However, with respect to "you could do it in a few lines of code": I think this 
would be the way we would want to perform every query, so we'd have similar 
code all over the place and/or some library stuff that we need to include in 
all of our projects (maybe we could call this library nndb ;). I was actually 
sort of surprised that ndb didn't function this way out of the box.

Original comment by jcoll...@vendasta.com on 22 Dec 2011 at 5:35

@GoogleCodeExporter
Copy link
Author

Hmmm... I don't think I would want for every query, unless you have a very high 
cache hit rate, which I find doubtful for queries. When it's a cache miss, its 
slower -- three RPCs (query, memcache, datastore) vs. 1.

If you have data to show that in *your* app your cache hit rate is really high 
for all queries I'd like to hear about this -- even if it's only one app, data 
can overrule intuition.

Original comment by guido@google.com on 23 Dec 2011 at 12:33

@GoogleCodeExporter
Copy link
Author

I suppose it depends heavily on cache eviction - which in the case of App 
Engine, you'd have much more visibility than I would.

One of the great things about the way that NDB caches is that it's able to 
cache "forever" with consistency. So, in our case, when we have a set of 
entities that change very rarely, it's very conceivable that many/most of the 
entities will be found in (mem)cache. Small datastore ops are cheap and 
memcache is free, so it makes for a really compelling combination.

Cache eviction aside, of course.

I'm with you on the mutliple RPCs though. Has anyone ever measured? Which is 
faster:

- one query returning, say, 100 entities

(or)

- one keys-only query returning 100 keys, and,
- a memcache.get_multi() on 100 keys

I.e., assuming a full cache hit, is the one-RPC query still faster?

If so, then I only have a $$ argument left.

Original comment by jcoll...@vendasta.com on 23 Dec 2011 at 1:02

@GoogleCodeExporter
Copy link
Author

It's an interesting question. It might depend on the size of the entities as 
well. I would measure it for different cache hit rates too, e.g. 0%, 50%, 100%.

Original comment by guido@google.com on 23 Dec 2011 at 4:03

@GoogleCodeExporter
Copy link
Author

I did a quick little, unscientific experiment (as you can tell by the variance).

Querying 500 entities out of a population of about 40K, filtering on 2 repeated 
properties (via zig-zag), ordering on a third property.
High replication datastore, on App Engine production infrastructure (not 
dev_appserver).
Using NDB with instance and memcache disabled (i.e., I am doing explicit 
memcache work in this experiment).

Memcache hit ratio: 100% (everything was in cache)

  Query for entities:              3755 ms
  Query/memcache/ndb:              3239 ms
    Keys-only query:       834 ms
    Memcache.get_multi:   2387 ms
    ndb.get_mutli:           0 ms

Memcache hit ratio: 75%

  Query for entities:              3847 ms
  Query/memcache/ndb:              3928 ms
    Keys-only query:       859 ms
    Memcache.get_multi:   1564 ms
    ndb.get_mutli:        1491 ms

Memcache hit ratio: 50%

  Query for entities:              3507 ms
  Query/memcache/ndb:              5170 ms
    Keys-only query:       825 ms
    Memcache.get_multi:   1061 ms
    ndb.get_mutli:        3168 ms

Memcache hit ratio: 25%

  Query for entities:              3799 ms
  Query/memcache/ndb:              6335 ms
    Keys-only query:       835 ms
    Memcache.get_multi:    486 ms
    ndb.get_mutli:        4875 ms

Memcache hit ratio: 0% (no memcache hits)

  Query for entities:              3828 ms
  Query/memcache/ndb:              8866 ms
    Keys-only query:       836 ms
    Memcache.get_multi:     13 ms
    ndb.get_mutli:        8012 ms


It definitely starts to drop off after 75% hit ratio, though there still ends 
up being a $$ savings I think.

BTW, I moved to an F2 front-end instance for this test to get the RAM I needed; 
the tests were more than twice as fast on the F2, and memcache was 3-4x as fast 
- it was dramatic. All the PB deserialization?

Original comment by jcoll...@vendasta.com on 23 Dec 2011 at 5:07

@GoogleCodeExporter
Copy link
Author

I decided to run some of these again on an F1 instance and use appstats to 
determine the difference between the time I'm measuring in code versus the RPC 
time as measured in appstats. The appstats metrics are in parentheses.

Querying 500 entities out of a population of about 40K, filtering on 2 repeated 
properties (via zig-zag), ordering on a third property.
High replication datastore, on App Engine production infrastructure (not 
dev_appserver).
Using NDB with instance and memcache disabled (i.e., I am doing explicit 
memcache work in this experiment).
*** Running on an F1 front-end instance

Memcache hit ratio: 100% (everything was in cache)

  Query for entities:              6197 ms (2 RPC: 2213 + 1733 = 3946 ms) (overhead: 6197 - 3946 = 2251 ms)
  Query/memcache/ndb:              5557 ms 
    Keys-only query:      1520 ms          (2 RPC: 115 + 638 = 753 ms) (overhead: 767 ms)
    Memcache.get_multi:   4016 ms          (1 RPC: 31 ms) (overhead: 3985 ms!)
    ndb.get_multi:           0 ms

Memcache hit ratio: 75%

  Query for entities:              6091 ms (2 RPC: 1658 + 2118 = 3776 ms) (overhead: 2315 ms)
  Query/memcache/ndb:              7251 ms
    Keys-only query:      1559 ms          (2 RPC: 85 + 907 = 992 ms) (overhead: 567 ms)
    Memcache.get_multi:   2857 ms          (1 RPC: 25 ms) (overhead: 2832 ms)
    ndb.get_multi:        2818 ms          (125 parallel RPCs: longest: 2182 ms) (overhead: 636 ms)

Memcache hit ratio: 50%

  Query for entities:              5636 ms (2 RPC: 1661 + 1756 = 3417 ms) (overhead: 2219 ms)
  Query/memcache/ndb:              8810 ms
    Keys-only query:      1357 ms          (2 RPC: 90 + 718 = 808 ms) (overhead: 549 ms)
    Memcache.get_multi:   2077 ms          (1 RPC: 19 ms) (overhead: 2058 ms)
    ndb.get_multi:        5366 ms          (250 parallel RPCs: longest: 4028ms) (overhead: 1338 ms)

Some things I found surprising:
 - memcache stub's unpickling overhead is high; the super-fast RPC times reported in appstats are misleading as to the actual time
 - key-only queries are very fast (i.e., relative to the full query), but even there, the PB deserialization is a real overhead
 - the parallel RPCs slow down as more entities are involved. I suspect this is just some artifact of when the callbacks and timers are called and doesn't really reflect reality
 - generally speaking, the get_multi() calls (both memcache and ndb) have a lot of overhead that I never really considered before. I always assumed that the RPC time was so much larger than serialization costs that serialization costs weren't important.

Original comment by jcoll...@vendasta.com on 23 Dec 2011 at 3:04

@GoogleCodeExporter
Copy link
Author

Because of the large memcache pickling overhead, I wanted to test the 
effectiveness of model.get_multi() with and without memcache hits. I was 
concerned that the assumed performance improvement of using memcache would be 
washed out by overhead (at least with the types of entities I was using).

Doing an ndb.get_multi() with memcache policy off, then on (to stock memcache), 
then on again (for measurement).
Note: I made the adjustments mentioned in Issue 105 because I wasn't seeing any 
cache hits.
500 keys, same entities as previous tests. No instance cache.
F2 front-end instance
Using NDB 0.9.4 (previous tests were the NDB bundled with GAE 1.6.1)

  ndb.get_multi no memcache:   8178 ms   (500 parallel RPCs: longest: 6374 ms) (overhead: 1804 ms) **
  ndb.get_multi w/ memcache:   2958 ms   (1 RPC: 29 ms) (overhead: 2929 ms!)

** I suspect that due to the way that the callbacks and timers work, the 
parallel RPC time is actually shorter than 6374 ms and the overhead is actually 
longer than 1804 ms.

So (as I'm sure you've already proved out), it does seem that despite the large 
memcache picking overhead, there are still good performance improvements using 
memcache for large get_multi() calls.

Original comment by jcoll...@vendasta.com on 23 Dec 2011 at 4:13

@GoogleCodeExporter
Copy link
Author

So, getting back to the original issue, it seems that knowledge of 
probabilistic cache hit rates and knowledge of entity pickling/serialization 
costs are important when deciding to do a keys_only query + get_multi() versus 
just doing a full-on query.

I guess because of this, it probably doesn't make sense to bake this into NDB 
directly.

There still is definitely a $$ savings involved though. On a groups thread 
somewhere when the new billing model came out, a Googler showed that a 
keys_only + db.get([multi]) ends up costing less than a full query. They were 
close, but the former was less. Getting some amount of memcache hits will make 
the difference even greater.

I've got to say, I'm still sort of reeling by the serialization overhead...

Original comment by jcoll...@vendasta.com on 23 Dec 2011 at 4:19

@GoogleCodeExporter
Copy link
Author

I'll have to look into this more later; I'd like to reproduce your results and 
then analyze them to death. Comparison to old db would also be interesting. 
Some quick comments:

- I'd like to see the code for your tests so I can repro them

- The memcache use built into NDB should be slightly faster than just passing 
entities to memcache directly, because the key is not serialized in the former 
case (see line 630 in context.py); Also when using memcache there's an extra 
copy of all bytes involved (the serialized bytes are copied into the memcache 
request buffer using a separate serialization pass)

- I hope you didn't use profile or cProfile -- it adds a lot of overhead to 
function calls, and serialization uses a lot of function calls

- I hope you didn't use Python 2.7, because that is (currently) known to be 
slower in the serialization department

- Serialization is CPU intensive and even an F2 issue may be CPU throttled in 
this case, since your total request time is pretty high (for benchmarking code 
we should really have a way of disabling throttling)

- Why do you need so much memory? Perhaps you could rerun the tests with 
smaller numbers so they fit in memory -- that should also reduce the overall 
latency and hence reduce the throttling (hopefully)

- Serialization of entities with lots of properties is always going to be slow; 
1 property of 1K bytes is (I expect) much faster than 10 properties of 100 
bytes each

Original comment by guido@google.com on 24 Dec 2011 at 1:21

@GoogleCodeExporter
Copy link
Author

- I will send you a code package directly.
- I am just using a simple string for the memcache key.
- No profile/cProfile, just simple time.time()
- Used Python 2.5
- I was surprised by the memory requirement but I didn't look in to it.
- My entities definitely have a lot of small properties; my particular use case 
needs lots of ways to lookup the entities. Full-text search will help when 
available.

Original comment by jcoll...@vendasta.com on 24 Dec 2011 at 1:37

@GoogleCodeExporter
Copy link
Author

I don't think this should be an automatic feature; there are too many risks. 
It's easy enough to do this manually if you want it:

  q = MyModel.query(...)
  results = ndb.get_multi(q.fetch(keys_only=True))

Original comment by guido@google.com on 5 Jan 2012 at 9:10

  • Changed state: WontFix

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant