Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow customized expiry time for cache entry #1203

Closed
gissuebot opened this issue Oct 31, 2014 · 29 comments
Closed

Allow customized expiry time for cache entry #1203

gissuebot opened this issue Oct 31, 2014 · 29 comments

Comments

@gissuebot
Copy link

Original issue created by nikazim on 2012-11-14 at 11:01 AM


Allow per entry expiry time for caches built with expireAfterWrite / expireAfterAccess (something like put(K key, V value, long timeout).

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-11-14 at 04:43 PM


(No comment entered for this change.)


Labels: Type-Addition, Package-Cache

@gissuebot
Copy link
Author

Original comment posted by lowasser@google.com on 2012-12-20 at 08:18 PM


Wait. Wouldn't you want that only if the cache was not built with expireAfterWrite/expireAfterAccess?

In any event, this seems really awkward from an API perspective, in addition to being problematic from an implementation point of view. In particular, customized expiration times more or less require you to use an O(log n) priority queue, as opposed to the current O(1) implementation that is possible when you can just maintain a linked list of most recently written or most recently accessed.

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2013-03-12 at 06:43 PM


(No comment entered for this change.)


CC: fry@google.com

@gissuebot
Copy link
Author

Original comment posted by Ben.Manes on 2013-08-21 at 05:01 AM


fyi,

You wouldn't use a priority queue w/ O(lg n) insertion and removal. Instead you'd require an expected range for expiration times and a resolution (e.g. 1..60 in minutes). This would allow you to a create an array of linked list (e.g. a bounded height priority queue) so that add/remove/reorder is O(1). The penalty would be O(r) of scanning the first element in every slot to see if an eviction is required.

The API would still be awkward, even more so than the O(lg n) version requested, but would be much cheaper to maintain.

(Not advocating, but mentioning as this feature is often requested)

@gissuebot
Copy link
Author

Original comment posted by kak@google.com on 2013-08-22 at 11:29 PM


(No comment entered for this change.)


Status: Research

@edudar
Copy link

edudar commented Dec 8, 2015

+1 and went with net.jodah:expiringmap for now

@jonheg
Copy link

jonheg commented Dec 16, 2015

+1 aswell. Custom expire time pr entry would be nice

@lialun
Copy link

lialun commented Feb 6, 2016

+1

2 similar comments
@jruillier
Copy link

+1

@imgeorgeabraham
Copy link

+1

@violette
Copy link

+1 Indeed, custom expire time pr entry would be nice.

@ben-manes
Copy link
Contributor

I think providing this feature in O(1) time with a clean API that does not detract from the existing one is a potentially difficult problem. In almost every algorithmic design I can think of, the approach could be served equally well by an external library decorating the cache.

The simplest O(1) approach is to rely on the maximum size constraint. For large caches (like memcached) they augment this with a sweeper thread to proactively discard expired entries. The more general approach requires an O(lg n) algorithm to retain a sorted order. As we've tried to avoid managing our own threads, only use O(1) algorithms, and not be excessively opinionated in how to use the cache this is a little difficult to resolve.

The first step is arguments for why variable expiration is useful in a general library. Most cases we came across then and now use a fixed time for domain-oriented caches. The variable were for specific applications, e.g. http caching, where different data commingles into a single, large cache. Justifying this feature as a common case is needed before considering the implementation details.

The second step is whether the algorithm benefits from the internal mechanisms used by the cache (periodic maintenance). If not, e.g. the lazy O(1) approach, then the feature can easily be contributed as a decorating library. That then side steps the next problem of how to evolve the API cleanly.

@ben-manes
Copy link
Contributor

I should amend the above since @lowasser referenced this ticket on SO and, as hinted to in my '14 comment, the approach I keep asking myself about is a hierarchical / hashed wheel timer. This is the scheme used by kernel timers by using hierarchical clocks where a hand points to a bucket and an entry is marked with the number of rotations before expiring. This "approximated timer" scheme works because expiration is best effort and not exact. The hierarchical version is O(m) / O(1) time (where m is small) and would fit well into the cache's periodic maintenance.

This is the only approach I could see making it into the cache as a native feature. Unfortunately I still don't see a nice way to evolve the API (Cache and CacheLoader) to feel natural.

@jwcarman
Copy link

Could we perhaps create a new interface and not evolve the existing API? It would be a special type of cache.

@Maaartinus
Copy link

Unfortunately I still don't see a nice way to evolve the API (Cache and CacheLoader) to feel natural.

Wouldn't something similar to Weigher do? Maybe

interface Expirer {
    long expireAt(K key, V value, long lastWritten, long lastAccessed);
)

maybe with methods determining if it's to be called after write and/or access.

@ben-manes
Copy link
Contributor

For comparison, jsr107 (JCache) has a version of this called ExpiryPolicy. It uses null, a sentinel value (zero), and exceptions as alternative return values to the duration. It does not provide the context (key, value, metadata) due to not wanting to fetch that information if the entry is not in local memory.

I think variable access expiration could be problematic, depending on the implementation details. It may also be the less useful variant and might not be worth supporting. So I might argue that this would be a expireAfterWrite with a function similar to your proposal.

@ben-manes
Copy link
Contributor

I am starting to work on this and feedback would be appreciated.

The goal of the Expiry interface is simplicity and to avoid object allocations. We could make it return either a duration or point-in-time, which would be in nanoseconds.

interface Expiry<K, V> {
  long expireAfterCreate(K key, V value, long currentTimeNanos);
  long expireAfterUpdate(K key, V value, long currentTimeNanos, long expirationTimeNanos);
  long expireAfterRead(K key, V value, long currentTimeNanos, long expirationTimeNanos);
}

The duration approach would have the following characteristics:

  • The duration may not be zero or negative, meaning that it has be re-evaluated to be expired (as it would interact poorly with loaders).
  • An entry can be made eternal by returning Long.MAX_VALUE
  • To avoid updating the expiration time on a read or update, you can return
    expirationTimeNanos - currentTimeNanos.

The point-in-time would behave instead as,

  • The new expiration time may not be less than the current time, meaning the entry has already expired.
  • An entry can be made eternal by returning currentTimeNanos + Long.MAX_VALUE
  • To avoid updating the expiration time on a read or update, you can return expirationTimeNanos

Suggestions on preferences, a better name than Expiry, etc. would be appreciated.


Implementation-wise, this will be done using a hierarchical timer wheel (prototype). This allows add, remove, rescheduling, and expiring to be performed in O(1) time. Instead of using a O(lg n) priority queue, it uses a hierarchy of ring buffers, where each slot represents a time span (second, minute, hour). As the wheels rotate, the entries in the buckets are expired or rescheduled. This cascading effect amortizes the penalty and all operations are run during the cache's maintenance operation.

timer wheel

@Maaartinus
Copy link

An entry can be made eternal by returning currentTimeNanos + Long.MAX_VALUE

Note that the sum is negative, lone Long.MAX_VALUE should do. You can consider any time above e.g., Long.MAX_VALUE / 2 as "never" (as there's no practical difference between allowing expiration duration up to 146 or 292 years).

I guess, after an operation (create, update, or read), the corresponding method gets called. I wonder how their results compose. Does the last returned result win?

@ben-manes
Copy link
Contributor

Technically no, but practically yes. Ticker uses System.nanoTime() which states,

The value returned represents nanoseconds since some fixed but arbitrary origin time (perhaps in the future, so values may be negative).... The values returned by this method become meaningful only when the difference between two such values... is computed.

I am also leaning towards a duration API since that is consistent with the builder, e.g. expireAfterWrite.

And you're right, the last value wins. Otherwise we'd want an exclusive lock on the entry for reads to ensure atomicity. expireAfterRead (like access) has limited value as it is not about data freshness but lifetime (e.g. a session token).

@Maaartinus
Copy link

@ben-manes You're right, I forgot about how System.nanoTime() works.

The duration API could possibly be simpler:

interface Expiry<K, V> {
  long expireAfterCreate(K key, V value);
  long expireAfterUpdate(K key, V value, long currentDuration);
  long expireAfterRead(K key, V value, long currentDuration);
}

as you usually don't care about the current time. This is questionable and currentDuration probably needs to be computed.

@ben-manes
Copy link
Contributor

That's a good point on the API. My concern is that the duration may be a delta from a timestamp of an external resource. For example, basing it from a database row's creation_date with a 1hr TTL before a hard refresh is required. An example is a Google Maps geocode cache, which can only be held for 30 days by the license.

In these cases you need the current time to calculate with. A call to nanoTime or currentTimeMillis can be expensive under contention, so minimizing is a good practice in hot paths. Also currentTimeMillis can fluctuate by going backwards in time, so hopefully a call wouldn't accidentally violate a response by returning a zero/negative duration.

But perhaps I'm complicating the API by leaking optimizations? It is more important for an API to be elegant than fast. Per Josh Bloch's rule,

Consider the performance consequences of API design decisions, but don’t warp an API to achieve performance gains. Luckily, good APIs typically lend themselves to fast implementations.

@ben-manes
Copy link
Contributor

interface Expiry<K, V> {
  long expireAfterCreate(K key, V value, long currentTime);
  long expireAfterUpdate(K key, V value, long currentTime, long currentDuration);
  long expireAfterRead(K key, V value, long currentTime, long currentDuration);
}

Do you think this is the best compromise or that currentTime should be dropped altogether. I agree that currentDuration is more useful than the previous expirationTime field.

@Maaartinus
Copy link

Do you think this is the best compromise or that currentTime should be dropped altogether.

I'd keep it, assuming you're sure you can always provide it for free.

I agree that currentDuration is more useful than the previous expirationTime field.

It makes things simpler in the common case you don't want change the duration.

@ben-manes
Copy link
Contributor

This feature is now implemented in Caffeine. It could be ported into Guava if a core member was inclined.

@ben-manes
Copy link
Contributor

@Maaartinus Unfortunately thinking about the currentTime after release, I realized that it is misleading. The currentTime is from System.nanoTime(), which is a timestamp that should only be used to determine the elapsed time between two calls. It cannot be used when calculating against the wall clock time where System.currentTimeMillis() is appropriate. For entries calculated from a timestamp, e.g. 30 days since the database record's creation_date, the caller cannot use the currentTime parameter. So while the parameter was free to provide, it will likely be useless and confusing.

In v3.0 I may need to drop it. One of the nice things about Guava is the internal baking period to catch these mistakes that an independent OSS library may make.

@vanjor
Copy link

vanjor commented Dec 26, 2017

any update for this issue?vote for this feature

@lowasser
Copy link
Contributor

We are not doing any feature development for common.cache at this time.

@FoosballGG
Copy link

Any clarification as to why this issue was closed unresolved? This was one of the most voted for feature requests for the Guava Cache.

@ben-manes
Copy link
Contributor

@FoosballGG The feature is available in Caffeine, which includes Guava adapters. Currently Guava's cache is not being actively developed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests