Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some auxiliary collections are not serializable #978

Closed
gissuebot opened this issue Oct 31, 2014 · 19 comments
Closed

Some auxiliary collections are not serializable #978

gissuebot opened this issue Oct 31, 2014 · 19 comments

Comments

@gissuebot
Copy link

Original issue created by opinali@google.com on 2012-04-25 at 05:42 PM


Following the discussion in
https://groups.google.com/forum/?fromgroups#!topic/guava-discuss/V2puosAWGkg

It seems that the following collections should be marked as Serializable:

  • Collections2.FilteredCollection
  • Collections2.TransformedCollection

Other cases such as Lists.Trabnsforming_List are already serializable, so the different behavior of the Collections2._ aux classes seems to be a bug.

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-04-25 at 05:47 PM


You have my +1 (AND MY AXE), but I can't help but wonder if there was some reason it was left out previously. (To be fair, Collections2.transform is rarer than Iterables.transform or Lists.transform, so it might just not have been requested.)


Labels: -Type-Defect, -Priority-Medium, Type-Enhancement, Package-Collect

@gissuebot
Copy link
Author

Original comment posted by opinali@google.com on 2012-04-26 at 02:35 AM


I've done some due diligence searching the list, issues and checking source code comments, but couldn't find any evidence of a reason for that... nor can I think of any technical difficulty. It really seems like an oversight, as you say these methods are much less common to use than those for collections of a specific type.

For one thing, the argument to Iterables.transform() is even more general than Collection; it doesn't have to be a List, Map, or even any kind of Collection, it can be just some crazy non-collection object that implements the iterator() method. Still the API rewards use with a serializable transformed iterable.

@gissuebot
Copy link
Author

Original comment posted by cpovirk@google.com on 2012-04-30 at 05:00 PM


I thought that Kevin had some reservations about making many of our collections (especially views) serializable, but I don't remember the details. I know that the team a lot of effort into serialization a while back, and the goal may be to avoid that, but I thought that most of the effort was in establishing fixed serialized forms, which, in the end, we moved away from.


CC: kevinb@google.com

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-05-01 at 03:21 PM


I'd like to withdraw my +1. (Also, Iterables.transform is not, in fact, serializable; nor is Maps.transformValues, Iterables.filter, or frankly any of the functional transformation methods except Lists.transform.)

@gissuebot
Copy link
Author

Original comment posted by opinali@google.com on 2012-05-01 at 03:57 PM


@4: Here's a more complete list of methods that return a Serializable collection:

  • Lists.asList(first, [second,] rest)
  • Lists.transform()
  • All the unmodifiable*() methods for all collections and all maps

It seems Lists.transform() is the only one that takes a function argument; but this ain't coherent criteria either, things like Lists.reverse() return a non-Serializable object even though that is a simple view over the input, exactly like the Unmodifiable views.

Also, if the logic was that there's something special in List (or other "real" Collection which excludes maps), then filtering a List should also return a Serializable. But this doesn't happen because we don't have Lists.filter(), we have to use the general filter(Collection...), which IMHO is important missing functionality; a proper List->List filter would be very useful (but that's for another issue#).

In summary, I can't find any rationale for the decision of whether to return a Serializable collection, for all the methods that wrap/transform/filter/derive a collection from another. Even if there is some rule, this rule is obscure and it's also clearly incompatible with the Java Collections Framework which design is very simple: return a Serializable output whenever possible, mandatory when the input is Serializable.

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-05-01 at 04:02 PM


IIRC, I think our default rule is "return a Serializable output when someone asks for that feature, and we agree that it's a good idea."

(FYI, Lists.filter was rejected in issue 505.)

@gissuebot
Copy link
Author

Original comment posted by cpovirk@google.com on 2012-05-01 at 04:11 PM


We have not been at all systematic about this, it's true. By contrast, the JDK does seem to be in pretty good shape. The only obvious discrepancy I see at first glance is:

SerializableTester.reserializeAndAssert(new java.util.TreeSet().tailSet("")); // pass
SerializableTester.reserializeAndAssert(new java.util.ArrayList().subList(0, 0)); // fail

On one hand, thorough serialization support would take more time than is probably justified. On the other, we could address methods as they are requested. On the third hand, that just makes our inconsistency worse. But we're already inconsistent -- do we care if it's "worse?"

@gissuebot
Copy link
Author

Original comment posted by opinali@google.com on 2012-05-01 at 05:05 PM


@6: In that case, I am asking for this feature, now more formally below. It's fine if the answer is No, as long as there are any reasons for that, so far everything I heard in this thread was "I wonder if there was some reason", "Kevin had some reservations" [that nobody remembers or can find in lists/issues], etc.

  • Request: All methods (that take a collection/map input and return a derived collection output) should return a Serializable collection/map, unless there's any reason for this to be impractical or undesirable in specific methods. The burden of justification for exception should be on the decision to NOT be Serializable.
  • Trivial implementation: add "implements Serializable" where missing. All these methods return internal, non-public impls, so Guava has control of that. Most work would go in unit test and javadoc updates. I would happily volunteer to do all work.
  • There are no compatibility consequences, because Guava doesn't care about serialization compatibility across different JVMs. There are no costs in code size, implementation / maintenance complexity, performance tradeoffs, anything.
  • Current behavior is inefficient; it forces me to add code like "newArrayList(...)", which has a O(N) cost of allocating/copying all elements, just to make some output serializable e.g. to use it as a parameter or return value of a remote method.
  • Current design is incompatible with the design of the Java SE Collections API, a foundation of the Java platform. Result is very counter-intuitive, makes Guava incoherent with the framework that it extends, and incoherent even with itself.

@gissuebot
Copy link
Author

Original comment posted by opinali@google.com on 2012-05-01 at 05:20 PM


@7 (I wrote @8 before reading this) - playing the Devil's advocate, the reason for List.subList() not promising a Serializable output is explained in http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4782922: "This was intentional, and is entirely appropriate for "views." If you serialize a small view (say three elements), the entire backing List (say one million elements) gets serialized. If you deserilize the view, the entire million element list gets reconstructed at great cost, even though 99.9967% of it is forever inacessible."

Now this is a decent criteria to decide this matter. This means that methods like reverse(), unmodifiable...(), and transform().., which are injective functions (so the output will retain all elements from the input), should return a Serializable output. But non-injective methods like filter() should not do that because of the potential inefficiency described by Sun's bug. We could argue that even the latter methods could return a Serializable, if the method is lazy; but the lazy/eager behavior adds a new and complex factor to the problem and I'd agree that the simple and safe (if conservative) solution is avoiding Serializable here.

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-05-01 at 05:28 PM


"- Trivial implementation: add "implements Serializable" where missing. All these methods return internal, non-public impls, so Guava has control of that. Most work would go in unit test and javadoc updates. I would happily volunteer to do all work."

I think there might be issues with GWT serialization that make this a much less trivial thing than you think.

"- Current behavior is inefficient; it forces me to add code like "newArrayList(...)", which has a O(N) cost of allocating/copying all elements, just to make some output serializable e.g. to use it as a parameter or return value of a remote method."
I think that was actually the point -- to make those costs up-front. Most users of transform() and the like don't use Serializable functions, and serializing transformed collections has a lot of room for unexpected and weird behavior. Forcing users to make copies of those collections makes the costs and semantics obvious and up-front. I think there's a decent argument to be made for removing the serializability of Lists.transform.

@gissuebot
Copy link
Author

Original comment posted by opinali@google.com on 2012-05-01 at 06:33 PM


@10 I think Guava should make a reasonable effort to serve a more general audience. GWT supports Java SE collections too, with caveats like declaring the element type. So we'll also need helpers like readObject()/writeObject(); no big deal, though not in the single-liner league anymore. The public Guava collection/map classes already do that, as most custom collections should.

On efficiency: disagree; having those output collections non-serializable doesn't simply "make costs up-front" - it creates costs that wouldn't exist otherwise. A serializable output would just be consumed by the serializer without any temporary allocation or copying. Unless there's some problem specific to GWT that I wouldn't know. Considering regular serialization, the methods I'm now asking to review (see my comment @9) are no different than similar cases from JavaSE. No new "unexpected and weird" stuff in Maps.unmodifiableBimap() that already doesn't exist in Collections.unmodifiableMap(). Those don't even need read/writeObject(), just do a shallow serialization like any other Java object.

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-05-01 at 07:35 PM


Maps.unmodifiableBimap is already serializable.

Let me clarify: I'm concerned that there are reasons specific to transform() methods for serializing their results to be a bad idea. Example: you'll frequently see users saying Iterables.transform(peopleSet, getNameFunction) -- they're mapping objects into a single field of those objects. Something "smaller," not bigger.

Users who could serialize such collections would be serializing much more than they expected -- the whole objects, not the result objects -- and forcing an explicit copy makes the difference clear to those users.

Additionally, serializing a transformed collection can surprise users when the elements of the original collection are not serializable, and the elements of the result list are -- but since it's a view of the original collection, the collection (of serializable elements) is not serializable. We know for a fact users have gotten confused by this (in the case of Lists.transform) in the past -- see http://stackoverflow.com/questions/9418032/java-serialization-problems-while-using-guava-lists-transform.

@gissuebot
Copy link
Author

Original comment posted by opinali@google.com on 2012-05-01 at 08:18 PM


@12: The transform problem is a good argument. It's not perfectly equivalent to Sun's scenario (you can even, easily, have transforms that produce a bigger output than the input)... but the JDK collections already decides the tradeoff in the conservative side / least-surprise rule, so OK for me.

Well, the last standing part of my RFE is that the API should be consistent and intuitive. So I'd vote to remove the Serializable from Lists.transform().

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2012-05-01 at 09:32 PM


Chris said "We have not been at all systematic about this, it's true."

In the olden days, we were actually trying pretty hard to be systematic about it. We just realized it had been THE thing delaying the release of Google Collections 1.0 for quite a while, and we gave up on really perfecting the serialization support.

Here's the spreadsheet I was using at the time:

https://docs.google.com/spreadsheet/ccc?key=0AoFn3TZKLWTUcGQ4ZEFReUhiZGV4UTQ2Yk50NzhRWWc

I'm saddened to learn that we accidentally made Lists.transform serializable. :)


Labels: Wehavenotbeenatallsystematicaboutthisitstrue.

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-05-01 at 10:15 PM


Oh man. I had no idea we were so thorough back then.

I have to ask -- is there anything we could do to remove serializability for Lists.transform, while controlling compatibility issues?

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2012-05-01 at 10:55 PM


It's not worth it. We should just let it be a wart.

I think we can close this issue now?

(p.s. very amused by the "tag" I accidentally added in my last update.)


Labels: -Wehavenotbeenatallsystematicaboutthisitstrue.

@gissuebot
Copy link
Author

Original comment posted by opinali@google.com on 2012-05-02 at 12:10 AM


OK for me to close. Unfortunately Lists.transform() documents that it returns a Serializable list; but maybe this doc could be updated to discourage reliance on that, explaining the risk of transferring unwanted data, and warning that a future update may not be serializable. This would be a more "managed wart" even if the change is never made due to backwards compatibility.

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-05-02 at 12:13 AM


Marking as WontFix.

I vote we leave serialization in, but add some lines to the documentation telling people that it's a bad idea to use it. Lists.transform isn't @Beta -- that means we're committed to keeping functionality for two years before deleting it.

It might be worth deleting it in the longer term, but certainly not right now.


Status: WontFix

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2012-05-02 at 12:15 AM


I'll whip this doc change up right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant