Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Utilities for set operation views on maps #912

Closed
gissuebot opened this issue Oct 31, 2014 · 11 comments
Closed

Feature Request: Utilities for set operation views on maps #912

gissuebot opened this issue Oct 31, 2014 · 11 comments

Comments

@gissuebot
Copy link

gissuebot commented Oct 31, 2014

Original issue created by jor...@knewton.com on 2012-02-29 at 06:40 PM


The Sets class has static utility methods that create immutable views of the result of set operations on multiple sets.

It would be nice if there were similar methods for Maps. There already exists a map difference method.

For example, there could be a map union method that returns a new map whose entry set is the union of the entry sets of the input maps. Something would have to be done if the input key sets had a non-null intersection: the method could fail, or it could return a Multimap that contains multiple values for a key if the key was repeated across the input key sets.

This functionality is something that I use fairly commonly. I imagine it would be useful for others as well.

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-02-29 at 07:06 PM


Jordan, could you give us some more details on where exactly you've found yourself using this method -- an actual use case, with context and details? That'd really help us get a better sense for when this method would actually get used.

That said, I can see a few variables here:

Should this be strict -- making an explicit copy -- or a view? (Sets.union and Sets.difference are SetViews, which are views with a built-in fluent "copy-into" syntax; Maps.difference constructs an explicit copy.)

Explicit copies have the advantage that they're much more difficult to shoot yourself in the foot with -- the performance costs are all up front, and obvious, whereas if you pass a Maps.union view to some other method, you can quickly build up lots of hidden costs. (For example, imagine someone doing Maps.union(map1, Maps.union(map2,..)) for a hundred iterations, if Maps.union was implemented as a view. Queries to the resulting map are now slowed down by a factor of 100, but that fact might not be obvious to the programmer!)

What the return type should look like is...also tricky. I might be willing to get behind a Maps.disjointUnion that throws up when you pass it maps with intersecting key sets, but it feels like that addresses a much narrower use case.

One option that occurs to me -- that I really like -- is to provide

 <K, V> SetMultimap<K, V> Multimaps.union(SetMultimap<K, V>, SetMultimap<K, V>)

by doing a Set.union on the value sets; the semantics of that method would be pretty immediately intuitive -- and then you implement

 <K, V> SetMultimap<K, V> Maps.union(Map<K, V> map1, Map<K, V> map2) {
   return Multimaps.union(Multimaps.forMap(map1), Multimaps.forMap(map2));
 }

which seems to have the advantage of unambiguity: if they had different values, the Set contains two elements, if they had the same values or only one of them had a value, the Set contains one element, and that just follows inevitably from the Set contract.

(If we can hear about your particular use cases, that'll help us get a sense of what sort of semantics would actually be most useful to you -- I have no idea if this SetMultimap approach would actually do what you want.)


Labels: Type-Enhancement, Package-Collect

@gissuebot
Copy link
Author

gissuebot commented Oct 31, 2014

Original comment posted by jor...@knewton.com on 2012-02-29 at 08:01 PM


The multimap-based approach you suggest sounds good to me. Personally I'd get most use out of the functionality if it were a view, but that might not be true for everyone.

Example use case:

I'm writing some code to merge two versions of a graph together. I start with two maps, from node id to node, one for each of the graph versions.

I perform some work to determine whether the node sets can be happily merged together, and if they can, I merge them. One of the methods for doing this involves simply overwriting a node that's present but different in both maps with the version that exists on one of the sides (c.f. git merge-recursive with the --ours or --their option). At the end of this work, I create another node id -> node map that contains the post-merge state of the graph. I still need to hold on to the original two copies of the maps for a while, for other reasons.

To prevent having to create a whole new map, I wrote some other code that implements the read-only portions of the Map interface by delegating to one of the two underlying maps, depending on which copy we're favoring in the case of conflicts.

Your proposed SetMultimap union would fit this use case, except I don't think that sets' iterators preserve insertion order, so you'd really want a ListMultimap union.

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-02-29 at 11:18 PM


Sidestepping the insertion order specification was, in fact, part of what I had in mind. ;)

We could preserve iteration order -- Sets.union specifies its iteration order relative to the iteration order of the input sets, for example. You wouldn't be able to do get(0) and get(1), but you could use Iterables.get(Iterable, int).

I observe, however, than Multimaps.forMap returns a SetMultimap, and that isn't something we can change right now.

Perhaps you might implement a Map that delegates to the SetMultimap, but if the underlying set has two elements, then you pick out the default again? I'm not sure I like that solution.

I also feel uncomfortable with unioning ListMultimaps -- or specifically, implementing "a view of the concatenation of two lists," which would be a prerequisite -- but I'm having trouble rationalizing why. Mostly, I think it has to do with "people will use it as cons and snoc when they really shouldn't." I'd like to get other Guava team members' read on that, though.

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-03-04 at 08:58 PM


For reference, I've written a draft implementation for both Multimaps.concat(ListMultimap, ListMultimap) and Multimaps.union(SetMultimap, SetMultimap) at https://github.com/lowasser/guava-experimental/commits/combine-multimap . I haven't added adequate tests or documentation, though.

I think the use cases are relatively solid, but I'm not positive that this change would be worth the code and maintenance cost. I'd really like to hear other people's opinions on this change.

@gissuebot
Copy link
Author

Original comment posted by fry@google.com on 2012-03-13 at 06:52 PM


(No comment entered for this change.)


Status: Acknowledged

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2012-05-30 at 07:45 PM


(No comment entered for this change.)


Labels: -Type-Enhancement, Type-Addition

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2012-06-22 at 06:16 PM


(No comment entered for this change.)


Status: Research

@gissuebot
Copy link
Author

Original comment posted by MrChrisPSimmons on 2013-07-23 at 09:23 AM


I could have used a union map view just now (or to put it another way I wrote one myself), but with one map taking precedence over than the other rather than returning a Map rather than a multi-foo. My use case is as follows.

We have items in a sparse table and each entry is identified by a map from dimension-> value. However, each dimension may have default value and sometimes we want a map with defaults included. Building all these maps up front was taking 56M of memory up but switching to a union map of reported + defaulted dimensions takes it down to 4.6M. In this case any performance loss was negligible and clearly worth the saving in RAM.

@MikeFHay
Copy link

Worth noting that Guava already provides utilities that make it pretty easy to implement this yourself.

/**
 * Returns a view of the union of two maps.
 * 
 * For all keys k, if either map contains a value for k, the returned map contains that value. If both maps
 * contain a value for the same key, this map contains the value in the first of the two provided maps.
 */
public static <K, V> Map<K, V> union(Map<K, ? extends V> a, Map<K, ? extends V> b) {
    return union(a, b, (v1, v2) -> v1);
}

/**
 * Returns a view of the union of two maps.
 *
 * For all keys k, if either map contains a value for k, the returned map contains that value. If both maps
 * contain a value for the same key, the conflict is resolved with the provided function.
 */    
public static <K, V> Map<K, V> union(Map<K, ? extends V> a, Map<K, ? extends V> b, BinaryOperator<V> conflictHandler) {
    return Maps.asMap(Sets.union(a.keySet(), b.keySet()),
            (K k) -> {
                // we assume that the maps do not contain null values
                V v1 = a.get(k);
                V v2 = b.get(k);
                if (v1 == null) {
                    return v2;
                } else if (v2 == null) {
                    return v1;
                } else {
                    return conflictHandler.apply(v1, v2);
                }
            });
}

I'm not sure if this API is ideal, and probably null values could be handled better, but if use-cases are rare then developers can just use Maps.asMap(Sets.union(... directly.

@mtbc
Copy link

mtbc commented Oct 16, 2018

Just found myself wanting the set difference of two SetMultimap.

@cpovirk
Copy link
Member

cpovirk commented Jul 25, 2019

We're less enthusiastic about "lazy view" collections nowadays, especially now that Stream exists. The closest we'd probably come to this feature request is #2632, which would allow something like concat(map1, map2).toMap().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants