Goldman Sachs collections – nearly everything you want from collections in Java

Java collection framework is not that powerful as experienced Java developer would expect.
For example, how do you sort a list?
Simple answer would be to use java.util.Collections.sort() method with some kind of java.util.Comparator implementation. Additionally Guava Ordering support can be used.
However, the solution is not exactly what object oriented developer looks for.
Similarly to sorting a collection you would probably deal with finding min or max element in a collection using  java.util.Collections.min() and java.util.Collections.max() methods respectively.
After all how to filter a collection? Or how to select a list of particular property extracted from the objects stored in the collection? It can be done in pure Java using a for loop, using Apache Commons Collections and its CollectionUtils.filter(), CollectionUtils.collect() or Guava Collections2.filter(). Nonetheless, still none of those solutions is fully satisfying from my point of view.
Of course, there is Java 8 in the game, but it is a quite new release that cannot be used in every project, especially in legacy one and its collection framework is still not optimal.

As a rescue for the above problems the Goldman Sachs Collections (GS Collections) framework comes in. It is a collection framework that Goldman Sachs open sourced in January 2012.

Here is quick feature overview of GS Collections comparing to Java 8, Guava, Trove and Scala:

source: infoq.com
source: infoq.com

 

Seeing this, even if you thought that Java 8 had everything you need from collections, you still should have a look at GS Collections.

Following this brief introduction I am going to present a quick overview of the main features GS Collections has to offer. Some of the examples are variants of the exercises in the GS Collections Kata which is a training class they use in Goldman Sachs to train developers how to use GS Collections. The training is also open sourced as a separate repository.

Going back to the example from the beginning of the post, it would be perfect if we have methods like sort(), min(), max(), select(), collect(), etc. on every collection. It is simple to put them in a util class but it does not reflect the object oriented design.

GS Collections has an interfaces accomplishing this in the following way (as an example):

GS Collections classes do not extend Java Collection Framework classes. They are instead new implementation of both Java Collection Framework and GS Collections interfaces.

 gs2

 Collect pattern

The collect patterns returns  a new collection where each element has been transformed. An example can be a case when we need to return price of each item in the shopping cart.

Collect pattern uses function which takes an object and returns an object of a different type. It simply transforms objects.

 

or using Java 8 lambda expressions:

or using method reference:

 

Select pattern

The select pattern (aka filter) returns the elements of a collection that satisfy some condition. For example select only those customers who live in London. The pattern uses predicate which is a type taking an object and returning a boolean.

or using Java 8 lambda expressions:

 

Reject pattern

The reject pattern returns the collection elements that do not satisfy the Predicate.

One note in regards to anonymous inner classes when it is not possible to use Java 8. It is advisable to encapsulate them in the domain object and then the above snippet changes into:

Other patterns using Predicate

  • Count pattern
    • Returns the number of elements that satisfy the Predicate.
  • Detect pattern
    • Finds the first element that satisfies the Predicate.
  • Any Satisfy
    • Returns true if any element satisfies the Predicate.
  • All Satisfy
    • Returns true if all elements satisfy the Predicate.

Testing

GS Collections includes helpful, collections-specific utilities for writing unit tests. There are implemented as extension of JUnit.
Instead of checking the collections size:

you can use:

Some more examples:

Predicates

GS Collections provides several built-in predicates:

Immutability

I personally prefer immutable data structures to mutable ones. The pros are that they can be pass around without making defensive copies, they can be concurrently accessed without possibility of corruption, etc.
Methods toList(), toSortedList(), toSet(), toSortedSet(), toBag() always return new, mutable copies.

ImmutableCollection interface does not extend Collection therefore has no mutating methods.

Flat collect

Flat collect pattern is a special case of collect pattern. While using a collect pattern when function returns a collection result is a collection of collections. On the other hand, flat collect in this case returns a single “flattened” collections instead of collection of collections.
or in pre-Java 8 way:

Static utilities

As stated in the beginning processing collections using methods on the interfaces is the preferred, object oriented approach. However it is not always feasible. As a solution GS Collections, similarly to JDK, introduces several static utility classes like Iterate, ListIterate, etc.
Some of them can be used to inter operate  with Java Collection Framework. What is more, they allow developers to refactor existing code base into the one using GS Collections incrementally.

Parralel iteration

GS Collections provides static utility for parallel iteration which can be used for data-intensive algorithms. It looks like the serial case, hiding complexity of writing concurrent code.

Remember that parallel algorithms are not usually a solution for performance problems. 

FastList as a replacement for ArrayList

 
FastList is considered a drop-in replacement for ArrayList. It is definitely more memory efficient and can be used to refactor legacy code in steps.
Let’s refactor that simple piece of code using GS Collections:

Step 1:
Step 2:
Step 3:

or if you need unmodifiable collection:

Step 4:
The analogous refactorings can be carried out for maps and sets using respectively UnifiedMap and UnifiedSet.

Parallel lazy evaluation

There are situation when first optimization which comes to the mind is to parallel operations. It can be justified especially in processing large chunks of data like collections of millions elements in multi -processor environment. GS Collections offers a functionality to implement it in a friendly way:

asParallel() method takes two parameters:

  • executorService
  • batchSize which determines the number of elements from the backing collection that get processed by each task submitted to the thread pool; from my experience the appropriate batch size has significant influence on performance and should be determined during performance tests

Performance

I did personally a few performance tests comparing lazy and parallel lazy evaluations using GS Collections but I did not do any comparison between GS Collections and other collections framework. Since Goldman Sachs promises that their implementation is optimized for performance and memory usage I tried to find any tests that prove that.
Here is an example comparison of GS Collections, Java 8 Collections and Scala Collections:
gs3
source: infoq.com

 

source: infoq.com
source: infoq.com

Summary

This is just a tip of the iceberg in regards of GS collections. The framework offers much more like support for stack data structure (MutableStack), bag data structure (MutableBag), multimaps (MutableListMultimap), grouping functionalities (groupBy, groupByEach), lazy evaluation (asLazy()). 
From my point of view it is a quality replacement for current Java Collections Framework.

Leave a reply:

Your email address will not be published.

Site Footer