Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-populate Cache #775

Closed
gissuebot opened this issue Oct 31, 2014 · 23 comments
Closed

Pre-populate Cache #775

gissuebot opened this issue Oct 31, 2014 · 23 comments
Labels

Comments

@gissuebot
Copy link

Original issue created by richarddowsett on 2011-10-27 at 12:14 PM


Pre-populating a cache is a common use case for loading static data from a database. This would also be specifically useful where lazy loading is not ideal. Please see http://stackoverflow.com/questions/7915016/pre-load-values-for-a-guava-cache for a specific case where this would be useful.

It would be ideal to create an interface that can pre-load all static data from the database and add it to the Cache at start-up, such as:

CacheBuilder.newBuilder().populate(new CachePopulator<String, Element>(){

@Override
public Map<String, Element> populate() throws Exception {
    return getAllElements();
}

}).build(new CacheLoader<String, Element>(){

@Override
public Element load(String key) throws Exception {       
    return getElement(key);
}

});

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2011-10-27 at 09:17 PM


When we add CacheLoader.loadAll and Cache.getAll in 11.0, that would seem like a cleaner way to do this. First create the empty cache, then just call getAll to fill it up.

@gissuebot
Copy link
Author

Original comment posted by raymond.rishty on 2011-10-28 at 11:47 AM


I suppose the objection to that would be that if I have, say, 100 keys, I would prefer to get the 100 corresponding values via a single DAO call that via 100 DAO calls.

@gissuebot
Copy link
Author

Original comment posted by richarddowsett on 2011-10-28 at 12:07 PM


I'm assuming this would be the interface for the new feature (or something similar): Cache.getAll(List<K> keys).

As Raymond has pointed out, this would result in keys.size() calls to the database where my solution would allow you to get all values in a single DAO call. This would surely be a better solution where the cache needs to be pre-populated with a large number of elements.

@gissuebot
Copy link
Author

Original comment posted by raymond.rishty on 2011-10-28 at 01:26 PM


CacheLoader has loadAll

public Map<K, V> loadAll(Iterable<? extends K> keys) throws Exception

While Cache has getAll

ImmutableMap<K, V> getAll(Iterable<? extends K> keys)

According to the Javadoc for CacheLoader, that default implementation of loadAll delegates to individual calls to get. However, the method is not final, and as the Javadoc offers, you are free to override with a bulk lookup in your own CacheLoader.

You might familiarize yourself with the interface: https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheLoader.java

@gissuebot
Copy link
Author

Original comment posted by richarddowsett on 2011-10-28 at 02:57 PM


This interface would allow the bulk loading of keys so it would seem to satisfy the requirement.

My only comment would be that if the full set of keys is not known at runtime, you can still override the method and bulk load all data (with one DAO call) but you are breaking the interface because you are not using the parameter keys to load the data. But it would still work the way I expect it to, I'm just trying to get a precise interface out of an API used by many other people :)

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2011-10-29 at 04:41 PM


It's not a perfect fit for this request, but I think that it will do. Note that loadAll() is permitted to return extra entries even whose keys were not asked for, so you could do some hack like just asking for a "magic key" if you don't know what the keys to load are. That said, it's probably cleaner to just create the cache, do your initial population load externally and use asMap().putAll().

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2011-11-15 at 10:06 PM


If the keys are known, loadAll may be the answer; if they aren't, you can still just get the data externally to the cache and put it all in. I don't think there's anything we need to do here, but if I'm missing something big let me know.


Status: WontFix
Labels: Type-Enhancement

@gissuebot
Copy link
Author

gissuebot commented Oct 31, 2014

Original comment posted by mar...@aie.pl on 2012-01-04 at 11:02 AM


Well, Kevin, of course both of your use cases meet the functional requirements BUT neither is clean. With the asMap() solution you make the client control the cache. Therefore the cache isn't self contained and responsibility bleeds out to outer classes.

And to answer your question. What you're missing is that the solution to this common use case (of populating the whole cache in one go) is still not present in guava-cache.

And now, my own problem:

In my work place I have an enum-like dictionaries in remote systems I need to cache at startup. That's because fine-grained web service operations return their IDs instead of names, symbols or descriptions. Of course I don't have any control over the services provided to me; and, unfortunately, these provide only one operation to get the whole dictionary at once. So what am I supposed to do?

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-01-04 at 03:58 PM


Why does the putAll force you to bleed responsibility to outer classes? Do the putAll immediately upon cache creation, right there, so no other classes have to worry about it.

@gissuebot
Copy link
Author

gissuebot commented Oct 31, 2014

Original comment posted by mar...@aie.pl on 2012-01-04 at 04:16 PM


May I ask for the example code? Keeping in mind only CacheLoader instance holds a reference to RemoteRepository.

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-01-04 at 04:47 PM


Rearrange that a bit, is the point. I'm suggesting something like

Cache<Key, Value> constructCache(RemoteRepository repo) {
  Cache<Key, Value> cache = CacheBuilder.newBuilder()
    .......
    .build(new CacheLoader<Key, Value>(){
       ....
    });
  cache.asMap().putAll(getInitialEntriesFromRepo(repo));
  return cache;
}

@gissuebot
Copy link
Author

gissuebot commented Oct 31, 2014

Original comment posted by mar...@aie.pl on 2012-01-04 at 08:35 PM


But that's exactly the case - the Cache LOADER isn't loading the entries now. Outer class does, doesn't it? And it puts the entires into the cache. From outside, via public interface. That's this little minor detail which makes guava cache shine when compared to the standard bunch (ehcache, treecache, etc).

And what if the remote repo changes? I mean, what if remote system adds an entry after the cache is populated? Then the call to get(key) should retrive this new value (or simply refresh the whole set again). Now, you can't do that because you have initialized it from outside, so you need to duplicate the loading parts inside the cache. Ugly.

I know it's hard to implement this with current CacheLoader interface and Cache loading internals. But maybe it could be split into two interfaces? And then people could decide which one to use? Like CacheLoader for single items and FullCacheLoader (or any other fitting name) for the whole set? Just thinking aloud. I'm sure you, guys, are capable of coming up with something even better. Just trying to help.

Anyways, this is a great library. Thank you!

@gissuebot
Copy link
Author

Original comment posted by fry@google.com on 2012-04-06 at 03:10 PM


Let's keep this open for now. It's also come up at:
http://stackoverflow.com/questions/7915016/pre-load-values-for-a-guava-cache
and
https://groups.google.com/d/msg/guava-discuss/-DHg4Hi4sQE/HmlhQd8nkowJ


Status: Acknowledged
Labels: Package-Cache

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-04-25 at 06:31 PM


I guess my thinking is as follows:

I sort of assume that in most cases, the cache loader is defined inline, as an anonymous class. In this case, you really don't lose anything by moving the initial population code outside the loader, because it's exactly as exposed as it always was.

But let's assume that's not the case -- that the cache loader is a top-level class. In most cases, a cache loader is used with only one cache configuration. So add a (possibly static) method to your CacheLoader, public LoadingCache<K, V> buildCache(). If you add the population code in that method, then you're set: it looks like

public class MyLoader extends CacheLoader<K, V> {
   public V load(K key) {
     ...
   }

   private static void populate(LoadingCache<K, V> cache) {
     ...
   }

   public static LoadingCache<K, V> buildCache() {
     LoadingCache<K, V> cache = CacheBuilder.newBuilder()
        ...configuration options...
        .build(new MyLoader());
      populate(cache); // uses putAll
      return cache;
   }
}

If you use your cache loader with multiple different configurations, then you can either make your method take a pre-configured CacheLoader, or you can even pass in a CacheBuilderSpec (with release 12) that takes care of all the other details.

All of this manages to keep the encapsulation we want without messing with an already excellent abstraction.

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2012-05-30 at 07:41 PM


(No comment entered for this change.)


Labels: -Type-Enhancement, Type-Enhancement-Temp

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2012-05-30 at 07:45 PM


(No comment entered for this change.)


Labels: -Type-Enhancement-Temp, Type-Enhancement

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2012-06-22 at 06:16 PM


(No comment entered for this change.)


Status: Research

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-08-18 at 04:22 PM


Issue #1099 has been merged into this issue.

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2012-11-09 at 10:15 PM


It's very unclear what if anything needs to be done here.

Martel, Have you tried the idea outlined on the stack overflow post linked in #13?

If you have a strong need that really isn't being addressed by anything we offer now, please make a new attempt to describe the whole problem (not intended solution) in clear terms. (ie, file a new issue)


Status: WontFix

@gissuebot
Copy link
Author

gissuebot commented Nov 1, 2014

Original comment posted by mar...@aie.pl on 2012-11-09 at 11:42 PM


First of all, Kevin, thanks. Secondly, if you mean the "feature request" from actual Stack Overflow question then it's exactly what I had in mind.

Actually I haven't looked into this issue since I wrote my last message so I'm not sure where we stand now. I simply wanted to encapsulate whole loading logic inside the CacheLoader interface which just seems more in line with what is already there. I don't have any strong feelings for or against it, though.

I can make another issue but I don't see the point if you're not sharing my point of view. It's still a decent API even if I can't pre-populate the cache.

Michał

@Normal
Copy link

Normal commented Mar 15, 2018

but what if I need the data be populated repeatedly on eviction time is over? PutAll doesn't handle that. Isn't it?

@StephenFlavin
Copy link

StephenFlavin commented Jan 22, 2020

I am currently in the same situation as above.
I don't know all of the possible keys at runtime, I have a DB table that I do lookups on frequently (>10k requests per minute) the entire DB table could be stored in memory and refreshed on a set interval.
For my use case I had started looking at Suppliers#memoizeWithExpiration however since the cache is so frequently used it would be preferable to have a non blocking refresh at a given interval.
I have raised #3777 for discussion on an implementation using suppliers but I still wish there was a way to do this in a LoadingCache.

@laeubi
Copy link

laeubi commented Jun 30, 2020

I also have demands for this and like to add another point why I think none of the workarounds really satisfies me:

  • First of all, I don't know if the cache is accessed at all so I don't want to fill the cache at init time.
  • Even when this won't be an issue the cache needs to be created as fast as possible (application startup) later on I don't really mind a small delay as the Application is up and running and I can show waiting indicator to the user
  • Second (as described with the webservice in comment before), it is equally expensive to retrieve one value or all (current) values
  • Beside that I don't know all keys in advance

So what would be need is some kind of "preloadKeys(T ...)" that behaves the following way:

  1. if it is passed an empty array, and the cache is empty while accessed by a get operation, CacheLoader.loadAll(Iterable<? extends K>) is called with a null argument, or if that is not desireable, a special (empty) Iterable instance that could be checked e.g. LOAD_ALL = Collections.EMPTY_LIST
  2. if it is passes a non empty array and any of the kexs is accessed, CacheLoader.loadAll(Iterable<? extends K>) is called instead of simple load

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants
@Normal @laeubi @gissuebot @StephenFlavin and others