Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregator Functions #546

Closed
gissuebot opened this issue Oct 31, 2014 · 9 comments
Closed

Aggregator Functions #546

gissuebot opened this issue Oct 31, 2014 · 9 comments
Labels
status=working-as-intended type=enhancement Make an existing feature better

Comments

@gissuebot
Copy link

Original issue created by SeanPFloyd on 2011-02-10 at 06:39 PM


I would like to see a way to perform aggregation on collections / iterables.

Example a: Given a List of Integers, add them all up
Example b: Given a Collection of Person Objects, calculate their average age by querying their birthDate fields

My original idea was for Example b) to add an aggregate() method to Iterables:

/**
 * Aggregate the selected values from the supplied {@link Iterable} using
 * the provided selector and aggregator functions.
 * 
 * @param <I>
 *            the element type over which to iterate
 * @param <S>
 *            type of the values to be aggregated
 * @param <A>
 *            type of the aggregated value
 * @param data
 *            elements for aggregation
 * @param selectorFunction
 *            a selector function that extracts the values to be aggregated
 *            from the elements
 * @param aggregatorFunction
 *            function that performs the aggregation on the selected values
 * @return the aggregated value
 */
public static <I, S, A> A aggregate(final Iterable<I> data,
    final Function<I, S> selectorFunction,
    final Function<Iterable<S>, A> aggregatorFunction){
    checkNotNull(aggregatorFunction);
    return aggregatorFunction.apply(
        Iterables.transform(data, selectorFunction)
    );
}

But since this is basically a one-liner (the docs are much longer than the method :-)) and calling a generic method with many parameters would can be a real pain, I would instead suggest to create a class Aggregates that contains Aggregator functions only. I have added a sample implementation with some functions I could think of (see attached file):

public static Function<Iterable<String>,Integer> averageLength()
          A Function that returns the average length of the Strings in an Iterable.

public static Function<Iterable<? extends Number>,BigDecimal> averageOfFloats()
          A Function that returns a BigDecimal that corresponds to the average of all numeric values passed from the iterable.

public static Function<Iterable<? extends Number>,BigInteger> averageOfIntegers()
          A Function that returns a BigInteger that corresponds to the average of all numeric values passed from the iterable.

public static Function<Iterable<String>,Integer> maxLength()
          A Function that returns the length of the longest String in an Iterable.

public static Function<Iterable<String>,Integer> minLength()
          A Function that returns the length of the shortest String in an Iterable.

public static Function<Iterable<? extends Number>,BigDecimal> sumOfFloats()
          A Function that returns a BigDecimal that corresponds to the sum of all numeric values passed from the iterable.

public static Function<Iterable<? extends Number>,BigInteger> sumOfIntegers()
          A Function that returns a BigInteger that corresponds to the integer sum of all numeric values passed from the iterable.

From the examples above, this solves scenario 1:

BigInteger sum = Aggregators.sumOfIntegers().apply(listOfIntegers);

For scenario 2, it would have to be something like this:

    final Date averageBirthDate =
        new Date(Aggregators
            .averageOfIntegers()
            .apply(
                Iterables.transform(people, new Function<Person, Long>(){
                    public Long apply(final Person input){
                        return Long.valueOf(input.getBirthDate().getTime());
                    }
                }))
            .longValue()
        );

(or add Aggregators.averageDate() )

@gissuebot
Copy link
Author

Original comment posted by SeanPFloyd on 2011-02-10 at 06:41 PM


Darn, forgot to delete a main method for testing. Here's a new Version of the class

@gissuebot
Copy link
Author

Original comment posted by jim.andreou on 2011-02-10 at 08:12 PM


Sigh. I agree for the need, but I find the (usage) hassle huge. Especially if you compare this code and the equivalent scala code... it's painful.

I do play around with such designs for experimental data analysis. Here is an old attempt, somewhere 4 years back: http://code.google.com/p/jbenchy/

The point is it made my life easier by various shortcuts, i.e. your "I" would be a tuple, "S" would be just the name of the component (and the selector function would be implied, instead of having to painfully spell it out as an inner class), and so on.

But the whole thing, in its full generality, while fundamental and useful, does not feel right for Java to me; it's such an uphill battle. I would aggressively take shortcuts to make client code simpler, but then it wouln't be general enough for guava.

@gissuebot
Copy link
Author

Original comment posted by attilan3i on 2011-02-25 at 03:13 PM


I don't like the idea of specifying the selector function either. There might be a way to add aggregation support to guava in some form, but I don't think this will be it. I see an aggregator more like a new interface in guava and not Functions that perform on Iterables. I think the most natural representation of an aggregator would be:

public interface Aggregator<I, O> {
  void process(I value);
  O getResult();
}

There can be several utility methods gathered into a class called Aggregators, processing all elements of an iterable would be one of them:

public class Aggregators {
  private Aggregators();
  public static <I, O> O aggregate(Iterable<I> elements, Aggregator<I, O> agg);
  public static <T> Aggregator<T, Number> count();
  public static Aggregator<Number, Number> sum();
  ...
}

I like this approach better, but I'm still not sure if this has a place in guava.

@gissuebot
Copy link
Author

Original comment posted by SeanPFloyd on 2011-02-25 at 04:35 PM


attilan3i:

I agree, your approach is very nice. I would be very happy with that as well.

@gissuebot
Copy link
Author

Original comment posted by g.e.dejong on 2011-03-13 at 12:30 PM


Hmmm, this is just an instance of a foldl or foldr method, possibly with extra state. However, the Java language is just too verbose to fully encompass it in a usable matter. Still, I see a possibility looming. Sketching it here:

public interface AggregateState<T, K> {
  public void add(T elem);
  public K state();
}

public class DoubleAverageState implements AggregateState<Double, Double> {
  private int n = 0;
  private Double sum = 0d;

  public void add(Double d) {
    sum += d;
    n ++;
  }

  public Double state() {
    return sum / n; // return division by zero when nothing added
  }
}

public class AggregateStateFunction<T, K> implements BinaryFunction<AggregateState<T, K>, AggregateState<T, K>, T> {
  AggregateStateFunction<T, K> apply(AggregateStateFunction<T, K> function, T arg) {
    return function.add(arg);
  }
}

Iterable<Double> d = Arrays.asList(1d, 2d, 3d, 4d);
Double average = foldr(d, new AggregateStateFunction<Double, Double>, new DoubleAverageState()).state();

// Untested, but the meaning should be clear.

@gissuebot
Copy link
Author

Original comment posted by gscerbak on 2011-04-14 at 08:30 PM


Folding is nice, selector function unnecessary, AggregatorState could be IMHO replaced with the name Accumulator.

I don't know what are the reasons for not having indexed Iterators, but I guess they would be similar to those for not having Aggregator/Fold. Maybe one has to put up with "for" loop in Java...

@gissuebot
Copy link
Author

Original comment posted by g.e.dejong on 2011-04-20 at 09:44 AM


Accumulator definitely sounds better. The function (or functor) is there because it allows different behaviour when using the AggregateState. Graphically, this looks like this:

s0 --> s1 --> s2 --> s3
     ^ ^ ^
     f f f
 A1 --> A2 --> A3 --> nil

A is an iterator, f is the function (or functor), and the state is seperate of that (except for type parameterisation). By the way, in our implementation, the AggregateStateFunctor is a little bit more involved:

public static class AggregateStateFunctor<T, K>
            implements
            BinaryFunctor<AggregateState< ? super T, ? extends K>, T, AggregateState< ? super T, ? extends K>>
    {
        @Override
        public CollectionUtil.UnaryFunctor<AggregateState< ? super T, ? extends K>, AggregateState< ? super T, ? extends K>> apply(
                final T t)
        {
            return new UnaryFunctor<AggregateState< ? super T, ? extends K>, AggregateState< ? super T, ? extends K>>()
            {
                @Override
                public AggregateState< ? super T, ? extends K> apply(
                        final AggregateState< ? super T, ? extends K> k)
                {
                    return k.add(t);
                }
            };
        }
    }

(Functors eq. Google Functions here). The aggregate function then becomes:

public static <T, K> K aggregate(Iterable< ? extends T> it,
            AggregateState< ? super T, ? extends K> state)
    {
        return foldl(it, state, new AggregateStateFunctor<T, K>()).state();
    }

and as an examle a static method top (returns the top elements of a list):

public static <T extends Comparable< ? super T>> Collection<T> top(Iterable< ? extends T> it)
    {
        return aggregate(it, new ComparableTopAggregateState<T>());
    }

which, to me, looks rather elegant. It allows you to use ComparableTopAggregateState in another context.

@gissuebot
Copy link
Author

gissuebot commented Oct 31, 2014

Original comment posted by em...@soldal.org on 2011-07-13 at 10:49 PM


We use something similar to this quite regularly in our code.

Collator<T>(Comparator<T>,Function<Iterable<T>,Void>).collate(Iterable<T>)

We use the function basically as a simple closure which is called with sets of objects which are equivalent in the eyes of the comparator.

We've found this to be invaluable when trying to optimise away a lot of needless sql joins. By working only with objects that are similar, we can reduce the complexity required elsewhere.

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2011-07-16 at 08:21 PM


Thanks for the suggestions. Unfortunately, Guava really does not want to go in this direction (higher-order functional programming). We have things like Function and Predicate because we need them for various reasons, but we don't think FP is a good fit for Java as it currently stands in general. If you disagree, perhaps try http://functionaljava.org/ (a library which I, of course, strongly dislike, but to each his own).


Status: WorkingAsIntended
Labels: Type-Enhancement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status=working-as-intended type=enhancement Make an existing feature better
Projects
None yet
Development

No branches or pull requests

1 participant