Lightweight Collections in GWTWe've found that the API semantics for the JRE collections classes are not an ideal match for the constraints of running inside browsers, especially mobile browsers, for several reasons. Background and MotivationThe JRE collections framework encode assumptions about the relationship between types, objects, and their implementation making them highly coupled, even at the top of the hierarchy. For example, the Map interface has methods such as values(), entrySet(), keySet() that make very specific guarantees about how the returned objects interact with the map from which they derive. These sorts of prescribed dependencies between collection types (e.g. using a Map implies the use of Set and Collection) and the behavior of collection objects themselves prevent simple implementations. Repeated attempts to optimize GWT's emulated JRE collections show a relatively high cost for even minimal usage of JRE collections because, for example, using a Map implies using a Set and a Collection. Too few collection types forces runtime enforcement of behaviors that could be more efficiently modeled statically. A common example is a class T that wishes to return a reference an internal list. However, the List interface has mutators (e.g. add()) which T would not want to make possible "from the outside." The traditional best practice of calling Collections.unmodifiableList() on the returned list reference requires the allocation of an extra object (which generates extra GC runs that can turn pathological in IE) as well extra code that must be generated simply to implement throw new UnsupportedOperationException() when a mutator is called on the wrapper object. If instead there were an ImmutableList type that the list collections implement, class T could simply have returned an immutable reference, which would ensure attempts to modify the list are caught at compile-time (i.e. because ImmutableList has no mutators) and which would prevent extra object allocations and potentially even facilitate inlining at compile time. This Under-specified semantics in JRE collections lead to defensive programming in application code. Java developers are encouraged to use the least specific type that has the desired semantics. For example, such advice would say use List in your code rather than ArrayList. However, there is no guarantee that List#get(int i) returns in constant time, so a generic algorithm written in terms of List cannot even be assured of its expected time complexity. The JRE collections includes the tag interface RandomAccess as a way to workaround the ambiguity, but forcing a check such as myList instanceof RandomAccess is time consuming at runtime (where time is precious) and, worse, can defeat the GWT compiler's optimizer. Goals- Absolute minimum size of compiled script. It should be impossible to write more succinct JavaScript by hand.
- Absolute maximum speed. It should be impossible to faster JavaScript by hand.
- Explicit and well-publicized guarantees of time complexity for each operation. By removing fuzziness, developers can feel confident that they need not be defensive (e.g. by cloning collections unnecessarily).
- The vocabulary of types should be rich enough to avoid the temptation of creating wrapper types. In other words, a method such as unmodifiableList() should never even seem to be necessary.
- Construction and mutation semantics should minimize object creation. Ideally, the collections themselves could enforce patterns such as having only a singleton instance of an empty list.
- The same set of collection types must work on the client and the server. We must avoid burdening application programmers with the asymmetry of using JRE collections on the server and GWT collections on the client.
- Collections must be efficiently serializable over GWT RPC.
- Collections must be able to be eval()-ed into existence from JSON.
- Collections must be pleasant to use, even relative to JRE collections. If developers are still tempted to use JRE collections, we could end up with the worst case of duplicating collection code.
- When non of the other goals are compromised, individual methods on collections should be source-compatible with JRE collections. This simplifies porting existing code to use the new collections and keeps methods as familiar as possible.
Non-goals- We are knowingly sacrificing true compatibility and interop with existing JRE collections. For example, we do not intend to implement the JRE Map, List, or Set interfaces.
- There is no particular goal of using the same collections implementation for both the client and the server. If we need to use super-source or deferred binding, so be it.
- Until the GWT compiler can optimize it (which it cannot to date), the new collections will not support the Java enhanced 'for' loop syntax that uses Iterable/Iterator. We do believe such optimizations are possible and will be added, however, at which time, these collections will be retrofitted to implement Iterable.
IntroductionNote: The collections described here are similar in several respects to Scala 2.8 collections, and where possible, terminology is borrowed to give consistent names to the similar concepts. We begin by acknowledging that the design of anything as fundamental as collection types is bound to be contentious, so the plan is to start small and establish some patterns for simple, high-value collections: Array and Map. PrinciplesAssertions, not exceptions. Assertions, when turned off, can be compiled away. Code written to defensively check for and throw exceptions is very hard to optimize. Hosted mode always has assertions on, so any well-tested code ought to execute assertions during normal testing. Types and methods must be used properly to perform properly. Lightweight collections are in no way defensive. Improper use generates an assertion failure or, if assertions are disabled, undefined behavior. The corollary is that the API can be designed for maximum efficiency. Bending the type system is okay if it's sufficiently hard to detect. If there are cases in which quietly cross-casting JSOs would be the most efficient thing to do, so it shall be done. However, in such cases, it should be a goal to do "the right thing" in hosted mode (e.g. actually have checked casts) but then silently slip in the trickier cross-casting version during a web mode compile. Hosted mode collections implementations can differ, if necessary, from web mode. This follows from the previous point but is also practical since OOPHM is based on IPC, it may simply be harmfully slow to implement hosted mode collections using JSNI. Publish and guarantee time complexity for various operations. Keep implementations locked down. Because it's very tricky to get the implementations for these sorts of collections just right, the plan is to minimize ways in which there could be confusion about the implementation of a given collection type. More concretely, the key types should be classes rather than interfaces so that their implementation can be restricted to the package in which they are defined. If developers cannot trust that a given collection instance is truly optimal and maintains time complexity guarantees -- as might be the case if collections types were interfaces which could have been implemented "anywhere" -- then they may feel compelled to write defensive code. By locking the implementations down, developers can rely on the behavior of a given collection type. !CollectionsUnfinished. Other collections, as they are fleshed out, will also be constructed via this class. /**
* The single starting point for interacting with collections from scratch.
* All collection instances are originally obtained via a method in this class.
* Using a factory method rather than a constructor for, say, MutableArray<E>
* provides convenient inference of the collection's generic type and also provides
* a way to satisfy the requirement that JSOs cannot have exposed constructors.
*/
public class Collections {
public <E> MutableArray<E> createMutableArray();
}Array, MutableArray, and ImmutableArray/**
* A "root collection" that provides a read-only interface to an array
* whose contents may or may not be mutable, making it possible to create read-only
* library algorithms that work with either mutable or immutable arrays without
* making calling code paranoid about whether the library routine might mutate it,
* thus preventing the desire to do defensive wrapping or copying.
*/
public class Array<E> {
@ConstantTime public E get(int index);
@ConstantTime public int size();
}
/**
* A "mutable collection" that is very similar to ArrayList
* (and C++ vector, and many others). It has the ability to efficiently
* "freeze" itself into an ImmutableArray.
*/
public class MutableArray<E> extends Array<E> {
// Changes the element at a particular index
@ConstantTime public void set(int index, E elem);
// Appends an element to the end of the array
@ConstantTime public void add(E elem);
// Inserts an element into the specified index, lengthening the array by 1
@LinearTime public void insert(int index, E elem);
// Removes the element at the specified index, shortening the array by 1
@LinearTime public void remove(int index);
// Removes all elements of the array
@ConstantTime public void clear();
// Creates an immutable array based on the state of this instance.
// The returned array may or may not have the same identity as this.
// After this call, this instance may no longer be used as a MutableArray instance
// (assertions double-check that no subsequent modifications are made).
@ConstantTime public ImmutableArray<E> freeze();
}
/**
* An "immutable collection" that makes that iron-clad guarantee that code
* holding a reference to an instance can be certain that its
* contents will not change over time.
*/
public class ImmutableArray<E> extends Array<E> {
// Only inherited members
}Examples of Intended UsageArrayspublic <E> void printEach(Array<E> arrToPrint) {
for (int i = 0, n = arrToPrint.size(); i < n; ++i) {
Window.alert(arrToPrint.get(i).toString());
}
}
public MutableArray<String> makeSnackArray() {
MutableArray<String> a = createMutableArray();
a.add("apple");
a.add("banana");
a.add("coconut");
a.add("donut");
return a;
}
public void exampleOne() {
MutableArray<String> a = makeSnackArray();
printEach(a);
}
public void exampleTwo() {
MutableArray<String> ma = makeSnackArray();
ImmutableArray<String> ia = ma.freeze();
// We can't touch 'ma' any more, but we can
// confidently hand out aliases all over the place.
new MyView1(ia).show();
new MyView2(ia).show();
new MyView3(ia).show();
}
|
In cases where a concrete collection models a non-sparse non-immutable random-access list backed by a native array, we can potentially preserve Java for-each by using the reinterpretCast trick to return a native array E since for-each works for both arrays and Iterable. In the immutable case, you can't use it due to the ability to poke around in the array.
Another potential idea is to support internal iterator ala Scala, by having an each() method that takes a callback. This avoids the need to have for-each or Iterator objects and allows the collection to implement the iteration over its elements in the best way possible. e.g.
lightweightArray.each(new Closure() {
});Since we're talking Scala, we can also add support for map, filter, and reduce. :)
You could return the array in the immutable case if the compiler knew it wasn't going to get modified in this particular case. Taking the idea from the other thread, the default implementation could be the safe one and a method call generator could generate a faster implementation when told by the compiler the result was only used in a read-only manner.
The downside of internal iterators are that you waste space in the object for its lifetime (even if it is a number of null references), and you can't have multiple iterators in progress on the same object at once. In your example, if f called something that wanted to iterate on lightweightArray, it can't (or if it does, internally it has to be creating essentially an external iterator plus the extra overhead to keep track of them rather than letting callers manage them). Additionally, you are creating an object for the callback (though the compiler might be able to remove the object allocation if it were smarter than it is today) so it isn't clear where the win is.
I had an idea how to reduce the code required for the immutable collections mutation methods throwing UnsupportedOperationException:
If the compiler detects that the bodies of two methods are the same then it can use the same Java Script method. For example in sudo Java / Java Script code:
var throwUnsupportedOperationExceptionMethod = function() { throw new UnsupportedOperationException?(); } immutableCollection.add = throwUnsupportedOperationExceptionMethod; immutableCollection.addAll = throwUnsupportedOperationExceptionMethod; immutableCollection.clear = throwUnsupportedOperationExceptionMethod;etc...
Java Script does not care if the number of parameters the caller is using is different to what is defined. For example you can call with no issue: `
This could be generalised to reduce the size of any code although it would be time consuming to find methods with the same bodies.
Dude, this is awesome. Major kudos!
For MutableArray.freeze, essentially changing a MutableArray to an ImmutableArray (even for existing references to the MutableArray) seems awkward, and will promote manual copies. I think the builder pattern is probably better here, since it is explicit that it exists to create an ImmutableArray rather than also being useful as a MutableArray later. If MutableArray extended ImmutableArrayBuilder, then you could not pass a ImmutableArrayBuilder to code that expected a MutableArray (without an explicit cast), and therefore the latter could depend on being able to mutate it later. Otherwise it seems that any callees who want to mutate the array must still make defensive copies lest someone else freeze that same instance. Maybe it is better to not have them related by inheritance but put all the methods as static in a shared utility class.
It seems like separate annotations for complexity measures are tricky beyond the obvious ones -- would it be better to have @Complexity("n^1/3") so there is a single way to specify them, especially in the case of O(n*m) etc?
Looks great!
I really like that the base Array does not have any methods for manipulation. This stuff will make our code smaller and more safe at the same time -- awesome!
One of the nice things about immutability is if the compiler can prove it, you can avoid alias-analysis for those objects, which enables better copy-propagation/cse as well as equational-reasoning. Without it, any method call or store to a reference of the same type has to be treated as potentially mutating the value of a variable, which limits code motion.
Equational reasoning allows you to hoist out values of fields without worry, effectively allowing you to intercept constructor calls at compile time, and propagating constructing parameters around as constants. That is, code like new Integer(10) can simply replace any call to Integer.intValue() with 10.