My favorites | Sign in
Google
                
Search
for
Updated Jun 23, 2009 by br...@google.com
LightweightCollections  
Design doc for GWT simplified collection classes

Lightweight Collections in GWT

We've found that the API semantics for the JRE collections classes are not an ideal match for the constraints of running inside browsers, especially mobile browsers, for several reasons.

Background and Motivation

The JRE collections framework encode assumptions about the relationship between types, objects, and their implementation making them highly coupled, even at the top of the hierarchy. For example, the Map interface has methods such as values(), entrySet(), keySet() that make very specific guarantees about how the returned objects interact with the map from which they derive. These sorts of prescribed dependencies between collection types (e.g. using a Map implies the use of Set and Collection) and the behavior of collection objects themselves prevent simple implementations. Repeated attempts to optimize GWT's emulated JRE collections show a relatively high cost for even minimal usage of JRE collections because, for example, using a Map implies using a Set and a Collection.

Too few collection types forces runtime enforcement of behaviors that could be more efficiently modeled statically. A common example is a class T that wishes to return a reference an internal list. However, the List interface has mutators (e.g. add()) which T would not want to make possible "from the outside." The traditional best practice of calling Collections.unmodifiableList() on the returned list reference requires the allocation of an extra object (which generates extra GC runs that can turn pathological in IE) as well extra code that must be generated simply to implement throw new UnsupportedOperationException() when a mutator is called on the wrapper object. If instead there were an ImmutableList type that the list collections implement, class T could simply have returned an immutable reference, which would ensure attempts to modify the list are caught at compile-time (i.e. because ImmutableList has no mutators) and which would prevent extra object allocations and potentially even facilitate inlining at compile time. This

Under-specified semantics in JRE collections lead to defensive programming in application code. Java developers are encouraged to use the least specific type that has the desired semantics. For example, such advice would say use List in your code rather than ArrayList. However, there is no guarantee that List#get(int i) returns in constant time, so a generic algorithm written in terms of List cannot even be assured of its expected time complexity. The JRE collections includes the tag interface RandomAccess as a way to workaround the ambiguity, but forcing a check such as myList instanceof RandomAccess is time consuming at runtime (where time is precious) and, worse, can defeat the GWT compiler's optimizer.

Goals

Non-goals

Introduction

Note: The collections described here are similar in several respects to Scala 2.8 collections, and where possible, terminology is borrowed to give consistent names to the similar concepts.

We begin by acknowledging that the design of anything as fundamental as collection types is bound to be contentious, so the plan is to start small and establish some patterns for simple, high-value collections: Array and Map.

Principles

Assertions, not exceptions. Assertions, when turned off, can be compiled away. Code written to defensively check for and throw exceptions is very hard to optimize. Hosted mode always has assertions on, so any well-tested code ought to execute assertions during normal testing.

Types and methods must be used properly to perform properly. Lightweight collections are in no way defensive. Improper use generates an assertion failure or, if assertions are disabled, undefined behavior. The corollary is that the API can be designed for maximum efficiency.

Bending the type system is okay if it's sufficiently hard to detect. If there are cases in which quietly cross-casting JSOs would be the most efficient thing to do, so it shall be done. However, in such cases, it should be a goal to do "the right thing" in hosted mode (e.g. actually have checked casts) but then silently slip in the trickier cross-casting version during a web mode compile.

Hosted mode collections implementations can differ, if necessary, from web mode. This follows from the previous point but is also practical since OOPHM is based on IPC, it may simply be harmfully slow to implement hosted mode collections using JSNI.

Publish and guarantee time complexity for various operations.

Keep implementations locked down. Because it's very tricky to get the implementations for these sorts of collections just right, the plan is to minimize ways in which there could be confusion about the implementation of a given collection type. More concretely, the key types should be classes rather than interfaces so that their implementation can be restricted to the package in which they are defined. If developers cannot trust that a given collection instance is truly optimal and maintains time complexity guarantees -- as might be the case if collections types were interfaces which could have been implemented "anywhere" -- then they may feel compelled to write defensive code. By locking the implementations down, developers can rely on the behavior of a given collection type.

!Collections

Unfinished. Other collections, as they are fleshed out, will also be constructed via this class.

/**
 * The single starting point for interacting with collections from scratch.
 * All collection instances are originally obtained via a method in this class.
 * Using a factory method rather than a constructor for, say, MutableArray<E> 
 * provides convenient inference of the collection's generic type and also provides
 * a way to satisfy the requirement that JSOs cannot have exposed constructors.
 */
public class Collections {
  public <E> MutableArray<E> createMutableArray();
}

Array, MutableArray, and ImmutableArray

/**
 * A "root collection" that provides a read-only interface to an array 
 * whose contents may or may not be mutable, making it possible to create read-only
 * library algorithms that work with either mutable or immutable arrays without
 * making calling code paranoid about whether the library routine might mutate it,
 * thus preventing the desire to do defensive wrapping or copying.
 */
public class Array<E> {
  @ConstantTime public E get(int index);
  @ConstantTime public int size();
}

/**
 * A "mutable collection" that is very similar to ArrayList 
 * (and C++ vector, and many others). It has the ability to efficiently 
 * "freeze" itself into an ImmutableArray.
 */
public class MutableArray<E> extends Array<E> {
  // Changes the element at a particular index
  @ConstantTime public void set(int index, E elem);
  
  // Appends an element to the end of the array
  @ConstantTime public void add(E elem);
  
  // Inserts an element into the specified index, lengthening the array by 1
  @LinearTime public void insert(int index, E elem);
  
  // Removes the element at the specified index, shortening the array by 1
  @LinearTime public void remove(int index);
  
  // Removes all elements of the array
  @ConstantTime public void clear();
  
  // Creates an immutable array based on the state of this instance.
  // The returned array may or may not have the same identity as this.
  // After this call, this instance may no longer be used as a MutableArray instance
  // (assertions double-check that no subsequent modifications are made).
  @ConstantTime public ImmutableArray<E> freeze();
}

/**
 * An "immutable collection" that makes that iron-clad guarantee that code 
 * holding a reference to an instance can be certain that its 
 * contents will not change over time.
 */
public class ImmutableArray<E> extends Array<E> {
  // Only inherited members
}

Examples of Intended Usage

Arrays

public <E> void printEach(Array<E> arrToPrint) {
  for (int i = 0, n = arrToPrint.size(); i < n; ++i) {
    Window.alert(arrToPrint.get(i).toString());
  }
}

public MutableArray<String> makeSnackArray() {
  MutableArray<String> a = createMutableArray();
  a.add("apple");  
  a.add("banana");
  a.add("coconut");
  a.add("donut");
  return a;
}

public void exampleOne() {
  MutableArray<String> a = makeSnackArray();
  printEach(a);
}

public void exampleTwo() {
  MutableArray<String> ma = makeSnackArray();
  ImmutableArray<String> ia = ma.freeze();

  // We can't touch 'ma' any more, but we can
  // confidently hand out aliases all over the place.
  new MyView1(ia).show();
  new MyView2(ia).show();
  new MyView3(ia).show();
}

Comment by cromwellian, Jun 23, 2009

In cases where a concrete collection models a non-sparse non-immutable random-access list backed by a native array, we can potentially preserve Java for-each by using the reinterpretCast trick to return a native array E since for-each works for both arrays and Iterable. In the immutable case, you can't use it due to the ability to poke around in the array.

Another potential idea is to support internal iterator ala Scala, by having an each() method that takes a callback. This avoids the need to have for-each or Iterator objects and allows the collection to implement the iteration over its elements in the best way possible. e.g.

lightweightArray.each(new Closure() {

public void f(String element, int index) {
// do something
}
});

Since we're talking Scala, we can also add support for map, filter, and reduce. :)

Comment by tamplinjohn, Jun 23, 2009

You could return the array in the immutable case if the compiler knew it wasn't going to get modified in this particular case. Taking the idea from the other thread, the default implementation could be the safe one and a method call generator could generate a faster implementation when told by the compiler the result was only used in a read-only manner.

The downside of internal iterators are that you waste space in the object for its lifetime (even if it is a number of null references), and you can't have multiple iterators in progress on the same object at once. In your example, if f called something that wanted to iterate on lightweightArray, it can't (or if it does, internally it has to be creating essentially an external iterator plus the extra overhead to keep track of them rather than letting callers manage them). Additionally, you are creating an object for the callback (though the compiler might be able to remove the object allocation if it were smarter than it is today) so it isn't clear where the win is.

Comment by rich...@zschech.net, Jun 23, 2009

I had an idea how to reduce the code required for the immutable collections mutation methods throwing UnsupportedOperationException:

If the compiler detects that the bodies of two methods are the same then it can use the same Java Script method. For example in sudo Java / Java Script code:

var throwUnsupportedOperationExceptionMethod = function() {
    throw new UnsupportedOperationException?(); 
}
immutableCollection.add = throwUnsupportedOperationExceptionMethod; immutableCollection.addAll = throwUnsupportedOperationExceptionMethod; immutableCollection.clear = throwUnsupportedOperationExceptionMethod;

etc...

Java Script does not care if the number of parameters the caller is using is different to what is defined. For example you can call with no issue: `

immutableCollection.add(e);
immutableCollection.clear();

This could be generalised to reduce the size of any code although it would be time consuming to find methods with the same bodies.

Comment by scottb@google.com, Jun 25, 2009

Dude, this is awesome. Major kudos!

Comment by tamplinjohn, Jun 25, 2009

For MutableArray.freeze, essentially changing a MutableArray to an ImmutableArray (even for existing references to the MutableArray) seems awkward, and will promote manual copies. I think the builder pattern is probably better here, since it is explicit that it exists to create an ImmutableArray rather than also being useful as a MutableArray later. If MutableArray extended ImmutableArrayBuilder, then you could not pass a ImmutableArrayBuilder to code that expected a MutableArray (without an explicit cast), and therefore the latter could depend on being able to mutate it later. Otherwise it seems that any callees who want to mutate the array must still make defensive copies lest someone else freeze that same instance. Maybe it is better to not have them related by inheritance but put all the methods as static in a shared utility class.

It seems like separate annotations for complexity measures are tricky beyond the obvious ones -- would it be better to have @Complexity("n^1/3") so there is a single way to specify them, especially in the case of O(n*m) etc?

Comment by stefan.haustein, Jun 26, 2009

Looks great!

I really like that the base Array does not have any methods for manipulation. This stuff will make our code smaller and more safe at the same time -- awesome!

Comment by cromwellian, Jun 27, 2009

One of the nice things about immutability is if the compiler can prove it, you can avoid alias-analysis for those objects, which enables better copy-propagation/cse as well as equational-reasoning. Without it, any method call or store to a reference of the same type has to be treated as potentially mutating the value of a variable, which limits code motion.

Equational reasoning allows you to hoist out values of fields without worry, effectively allowing you to intercept constructor calls at compile time, and propagating constructing parameters around as constants. That is, code like new Integer(10) can simply replace any call to Integer.intValue() with 10.


Sign in to add a comment