Building software is my passion - blog about backend software development

Is there a future for Scala Future? Or is there only ZIO?

Concurrency in Scala

How plain Scala approaches concurrency? Future “monad” is the answer (actor model was also part of Scala but got deprecated in Scala 2.10). Everyone used or use Scala Futures. People coming to Scala from Java are thrilled by the API it offers (comparing to Java Future). It is also quite fast, nicely composable. As a result Future is the first choice everywhere where the asynchronous operation is required. So it is used for both performing time consuming computations and to call external services. Everything that may happen in the future. It makes writing concurrent programs a lot easier.

Basic Future semantic

Scala’s Future[T], present in the scala.concurrent package, is a type that represents a computation that is expected to eventually result in a value of type T. The computation also might break or time out so completed future might be successful or failed with an exception. Alas, the computation might go wrong or time out, so when the future is completed, it may not have been successful after all, in which case it contains an exception instead.

Future vs functional programming

Let’s look at the Scala Future from the functional programming perspective. Technically it is a monad. However is Future really a monad?

What is monad exactly? Briefly speaking, a monad is a container defining at least two functions on type A:

identity (unit) – def unit[A](x: A): Future[A]
bind (flatMap) – def bind[A, B](fa: Future[A])(f: A => Future[B]): Future[B]

Additionally these functions must satisfy three laws: left identity, right identity and associativity.
From the mathematical perspective monads only describe values.

Scala Future obviously follows all above so it can be called a monad. However, there is also another approach that says that Future used for wrapping side effects (like calling an external API) cannot be treated as monad. Why? Because when Future does it, it is no longer a value.

What is more, Future executes upon data construction. This makes it difficult to follow referential transparency which should allow substitution of the expression with its evaluated value.

Example:

def sideEffects = (
  Future {println("side effect")},
  Future {println("side effect")}
)

sideEffects
sideEffects

It produces the following output:

side effect
side effect
side effect
side effect

Now if Future was a value we would be able to extract the common expression which is:

lazy val anEffect = Future{println("side effect")}
def sideEffects = (anEffect, anEffect)

And calling it like this should present the same results as in the previous example:

sideEffects
sideEffects

But it does not, it prints:

side effect

The first call to sideEffects runs the future and caches the result. When the sideEffects is called second time the code inside Future is not called at all.

This behavior clearly breaks referential transparency. However, if it also makes the Future not a monad is far longer discussion so let’s leave it for now.

Another problem with Future is ExecutionContext. They are inseparable. Future (and it’s functions like map, foreach, etc.) needs to know where to execute.

def foreach[U](f: T => U)(implicit executor: ExecutionContext): Unit

By default scala.concurrent.ExecutionContext.Implicits.global execution context is imported everywhere the Future is used. On one hand it is a bad practice because the execution context is decided early and then it is fixed. It also makes it impossible for the callers of the function to decide on which ExecutionContext they want to run the function. Obviously it is possible to make the ExecutionContext the parameter of the function but then it propagates into all of the codebase. It needs to be added to whole stack of function calls. Boilerplate code. Preferably we want to decide on the context on which we execute the functions as late as possible, generally when the program starts the actual execution.

Future performance

Let’s look at the performance of the Scala Future. In the later chapter we will compare it to other constructs that I suggest as a replacement.

We will run two computations:

eight concurrent and sequential operations computing trigonometric tangent of an angle and returning the sum of the values
three concurrent and sequntial operations finding all prime numbers lower then n and producing a sum of them

The source code for the test can be found here: https://github.com/damianbl/scala-future-benchmark

The machine used to run this benchmark is an Intel(R) Core(TM) i9 2,4 GHz 8-Core with 32 GiB of memory running on macOS Catalina 10.15.2.

Results:

[info] Result "com.dblazejewski.benchmark.FutureMathTanBenchmark.futureForComprehensionTestCase":
[info]   17049.404 ±(99.9%) 259.822 ns/op [Average]
[info]   (min, avg, max) = (16682.279, 17049.404, 18281.647), stdev = 346.855
[info]   CI (99.9%): [16789.582, 17309.226] (assumes normal distribution)

What instead of Future?

When we now some of the limitations of the Scala Future let’s introduce a possible replacement.

In the last months there is a lot of hype around ZIO library.

At the first glance the ZIO looks really powerful. It provides an effect data types that are meant to be high performant (we will see this in the performance tests), functional, easily testable and resilient. The compose very well and are easy to reason about.

ZIO contains number of data types that help to run concurrent and asynchronous programs. Most important are:

Fiber – fiber models an IO that started running, it is more lightweight than threads

ZIO – it is a value that models an effectful program, it might fail or succeed
Promise – it is a model of a variable that may be set a single time, and awaited on by many fibers
Schedule – schedule is a model of a recurring schedule, which can be used for repeating successful IO values, or retrying failed IO values

The main building block of the ZIO is the functional effect:

IO[E, A]

IO[E, A] is an immutable data type which describes the effectful program. The program may fail with error of type E or succeed with value of type A.

I would like to dive a bit deeper into Fiber data type. Fiber is base building block of the ZIO concurrency model. It is significantly different when you compare it to the to thread based concurrency model. In the thread based concurrency model every thread is mapped to the OS threads. Whereas the ZIO Fiber is more like a green thread. Green threads are threads that are scheduled by a runtime library or virtual machine instead of natively by the underlying operating system. They make it possible to emulate multithreaded environment without relying on the operating system capabilities. Green threads in most of the cases outperform the native threads but there are cases when the native threads are better.

Fibers are really powerful and nicely separated. Every fiber has its own stack, interpreter and that is how it executes the IO program.

Fibers scalability compared to Green Threads an Native Threads (John A. De Goes)

There is also one interesting thing about Fibers. They can be garbage collected. Let’s imaging a Fiber that runs infinitely. If it does not do any work at a particular time and there is no way to reach it from other components then it can be actually garbage collected. No memory leaks. Threads need to be shutdown and managed carefully in such case.

Let’s see how we can use Fibers. Imagine a situation when we have two functions:

validate – it does complex data validation
processData – it does complex and time consuming data processing

We would like to start the validation and processing data at the same time. If the validation is successful then the processing continues. If the validation fails then we stop processing. Implementing it with ZIO and Fibers is pretty straightforward:

    val result = for {
      processDataFiber  <- processData(data).fork
      validateDataFiber <- validateData(data).fork
      isValid <- validateDataFiber.join
      _ <- if (!isValid) processDataFiber.interrupt
      else IO.unit
      processingResult <- processDataFiber.join
    } yield processingResult

It starts processing and validating the data in lines (2) and (3). fork function returns an effect that forks this effect into a separate fiber. This fiber is returned immediately. In line (4) the fiber is joined, which suspends the joining fiber until the result of the fiber has been determined. If the validation result is false then we stop the processData fiber immediately, else we continue. Then in line (7) we wait for finishing the data processing.

This looks like the code that should run immediately, the same as similar code using Scala Futures. However, this is not the case for ZIO. In the above program we describe the functionality but we do not say how to run it. Running part is defined as late as possible. This feature also makes it pure functional. We can create a runtime and pass it around to run the effects:

  val runtime = new DefaultRuntime {}
  runtime.unsafeRun(DataProcessor.process())

Having the basic knowledge, we can write benchmarks we wrote with Scala Future before using the ZIO library.

The code can be found here:

Here are the results:

[info] Result "com.dblazejewski.benchmark.zio.ZioMathTanBenchmark.zioForComprehensionTestCase":
[info]   1090.656 ±(99.9%) 79.370 ns/op [Average]
[info]   (min, avg, max) = (1007.467, 1090.656, 1164.134), stdev = 52.498
[info]   CI (99.9%): [1011.286, 1170.026] (assumes normal distribution)

The performance difference is enormous. It only confirms what was already published by John A. De Goes on Twitter some time ago:

I am not going to dive in this post into the details of using ZIO library. There are already great resources provided:

the ZIO website: https://zio.dev/
blog post by Adam Warski
A Tour of ZIO by John A. De Goes

Functional Scala - Modern Data Driven Applications with ZIO Streams by Itamar Ravid

Conclusion

Scala Future was a very good improvement of poor Java Future. When jumping from Java to Scala it was so huge difference in "developer-friendliness" that every come back to Java was a real pain.

However the more you dive into functional programming the more limitations of the Scala Future you see. Lack of referential transparency, accidental sequentiality, ExecutionContext are only a few of those limitations.

ZIO is still in the early stage but I am sure it will really shine in the future. Also the trend that is happening in the Scala ecosystem these days where there is shift from the early object-functional programming to current purely functional programming will also favour purely functional solutions like ZIO.

AWS, SQS, Alpakka, Akka Streams – go reactive – Part 1

I am going to write a series of a few posts about the transition from the basic AWS ElasticBeanstalk SQS integration to the more generic, reactive model.

Transition means that we have a current state that we want to change. So let’s start with the basic AWS SQS integration.

Amazon SQS is a HTTP-based managed service responsible for handling message queues. It offers two kinds of queues: standard queues (maximum throughput, at-least-once delivery) and FIFO queues (preserving messages order, exactly-once delivery).

Just for the record – it is possible to integrate with AWS SQS using the SDK in many languages (Java, Ruby, .Net, Php, etc.) which give you both synchronous and asynchronous interface. What is more, SQS is also Java Message Services (JMS) 1.1 compatible (Amazon SQS Java Messaging Library).

Let’s assume our application performs some sort of operations that can take a bit longer to complete and can be done asynchronously in the background. However, they are mostly triggered as a result of user interaction with the system. These can be sending an email, generating and sending invoices to the clients, processing images, generating huge reports.

The first, naive implementation could be to spawn the local, asynchronous task and handle the job there. The drawbacks of such solution are obvious: it consumes local resources that could be used for handling more user requests, it is hard to scale, it is not manageable, etc.

The better solution is to introduce some kind of middleware layer that allows distributing the work among the workers.

Therefore we deploy the system in the Elastic Beanstalk environment with the following setup:

The web/api servers process user requests and offload the background tasks to the SQS queue using the AWS SDK SQS Api. Elastic Beanstalk has the daemon process on each instance that has the responsibility to fetch the messages from the SQS queue and push them to the worker server. The worker servers only expose the REST POST endpoint that is used by the daemon process to push the messages. The whole setup can be fairly configured using AWS WEB console and also monitoring is quite decent. There is also one feature that is quite hidden but worth mentioning. The Elastic Beanstalk daemon process makes sure that the worker servers do not become overwhelmed and it POSTs messages to them only if they are capable of processing them. The limitations are as following: only one SQS queue can be used – all messages are put into one queue and are processed sequentially and the system gets tightly coupled to AWS environment (Elastic Beanstalk).

What I am suggesting in this post is still sticking to the SQS queues and keeping the worker servers but changing the way the messages are put into workers. Instead of relying on Elastic Beanstalk daemon process let’s implement reactive streams on each of the servers that are connected to the SQS queues.

What we gain in such solution is the ability to use multiple queues, if we properly implement the integration layer we are not that tightly coupled to the AWS infrastructure too. This is in regards to the “reading” part. In the following posts, we will see what advantages we get if we also implement the “writing” part using the reactive streams.

Before we dive into the details, let’s quickly introduce the term reactive streams and reactive programming. Both terms are strictly connected.

Generally, reactive programming is programming model that deals with asynchronous streams of data. It is also a model in which components are loosely coupled. Actually, it is nothing new. We have seen event buses for a long time already. Also if we look at the user interaction with the application or website it is mostly stream of click or other actions. The real changer though is the way we handle those data streams.

Instead of repeating the theory and well-known phrases about reactive programming I will try to quickly show the difference between reactive and proactive programming on a simple examples.

Imagine we have a switch and a light bulb. How would we program switching on the bulb? The first solution is obvious. The switch knows what light bulb it controls. It sends the command to the specific light bulb. The switch is proactive and the light bulb is passive. It means that the switch sends commands and the bulb only reacts to those commands.

This is how we would program it:

import LightBulb._

case class LightBulb(state: Int, power: Int) {
  def changeState(state: Int): LightBulb = this.copy(state = state)
}

object LightBulb {
  val Off = 0
  val On = 1
}

case class Switch(state: Int, lightBulb: LightBulb = LightBulb(Off, 60)) {
  def onSwitch(state: Int): LightBulb = lightBulb.changeState(state)
}

The limitations are obvious. The code is tightly coupled. The switch needs to know about the bulb. What if want it to control more bulbs? What if we want one switch to control the light bulb and the wall socket? What about turning on the same light bulb using two different switches? The tightly coupled code is not open for extensions.

What is the reactive solution to this problem?

The switch is responsible for changing its state only. The light bulb listens to the state changes of any switch and based on this modifies its state. In this model the bulb is reactive – changing its state as a reaction to the switch state change and the switch is observable – its state is being observed by other components (bulbs, sockets, etc.)

The reactive solution:

import LightBulb._

case class Notification(state: Int)

trait Observer[T] {
  def update(notification: T)
}

trait Observable[T] {
  private var observers: List[Observer[Notification]] = Nil

  def addObserver(observer: Observer[Notification]): List[Observer[Notification]] = observer :: observers

  def removeObserver(observer: T): Unit = observers ::= observer

  def notifyObservers(notification: Notification): Unit = observers.foreach(_.update(notification))
}

case class LightBulb(state: Int, power: Int) extends Observer[Notification] {
  override def update(notification: Notification): LightBulb = this.copy(state = notification.state)
}

object LightBulb {
  val Off = 0
  val On = 1
}

case class Switch(state: Int) extends Observable[LightBulb] {
  def onSwitch(state: Int): Switch = {
    notifyObservers(Notification(state))
    this.copy(state = state)
  }
}

object SwitchBulb {
  val lightBulb = LightBulb(Off, 60)

  val switch = Switch(Off)
  switch.addObserver(lightBulb)
}

The end result of those two approaches is the same. What are the differences then?

In the first solution, some external entity (switch) controls the bulb. To do so it needs to have the reference to the bulb. However, in the reactive model, the bulb controls itself. Another difference is that in the proactive model, switch itself determines what it controls. In the reactive model, it does not know what it controls, it just has a list of items that are interested in its state changes. Then those items determine what to do as a reaction to the change.

The models look similar, actually, they mirror each other. However, there is a subtle but really important difference. In the proactive model, components control each other directly while in the reactive model the components control each other indirectly and are not coupled together.

The reactive approach let us build the systems that are:

message driven
scalable
resilient
responsive

In the next post of this series I will present the actual integration with AWS SQS using Akka Streams and Alpakka AWS SQS Connector.

Stay tuned.

Scala – tagged types

Data types in a programming language are the description or classification of the data that instructs the compiler how to treat the data. Of course, they are not only for the compiler or interpreter but also for us, the developers, as they help us understand the code big time.

This is a valid definition of the data which type is Map[String, String]:

val bookingPaymentMapping : Map[String, String] = Map(booking1.id -> payment1.id, booking2.id -> payment2.id)

This is the valid definition for our domain because both, booking and payment ids have the type String. Also for this trivial example, the type definition looks perfectly fine and pretty enough. However, we can imagine that in a more complex situation, in a bigger codebase we may lack some information when we see definitions like this:

val bookingPaymentMapping : Map[String, String]

As a result, it is not that rare that we see comments like this:

val bookingPaymentMapping : Map[String, String] //maps booking id -> payment id

We also quickly notice that it is not only about the readability of our code but also about the safety. This code perfectly compiles but it is not valid in our domain, it introduces a well-hidden bug:

val bookingPaymentMapping : Map[String, String] = Map(booking1.id -> payment1.id, payment2.id -> booking2.id)

What if we would like to add some extra information to the type definition? Something like “metadata” for the types? The information that not only helps the developers to comprehend the code but also introduces additional “type safety” by the compiler/interpreter.

The solution is there in the Scala ecosystem and it is called Tagged types.

There are two main implementations of Tagged types: Scalaz and Shapeless but also SoftwareMill’s Scala Common implementation should be mentioned. In this post, I will shortly show the usage of Shapeless Tagged types.

This is how the simple model that we convert to the new one using tagged types looks like:

import java.time.LocalDate

case class Booking(id: String, date: LocalDate)

case class Payment(id: String, bookingId: String, date: LocalDate)

object TaggedTypes {
  val booking1 = Booking("bookingId1", LocalDate.now)
  val booking2 = Booking("bookingId2", LocalDate.now)

  val payment1 = Payment("paymentId1", booking1.id, LocalDate.now)
  val payment2 = Payment("paymentId2", booking2.id, LocalDate.now)

  val bookingPaymentMapping: Map[String, String] = Map(booking1.id -> payment1.id, booking2.id -> payment2.id)
}

object Payments {
  def payBooking(bookingId: String) = Payment("paymentId", bookingId, LocalDate.now)
}

The final code should look close to this:

import java.time.LocalDate

case class Booking(id: BookingId, date: LocalDate)

case class Payment(id: PaymentId, bookingId: BookingId, date: LocalDate)

object TaggedTypes {
  val booking1 = Booking("bookingId1", LocalDate.now)
  val booking2 = Booking("bookingId2", LocalDate.now)

  val payment1 = Payment("paymentId1", booking1.id, LocalDate.now)
  val payment2 = Payment("paymentId2", booking2.id, LocalDate.now)

  val bookingPaymentMapping: Map[BookingId, PaymentId] = Map(booking1.id -> payment1.id, booking2.id -> payment2.id)
}

object Payments {
  def payBooking(bookingId: BookingId) = Payment("paymentId", bookingId, LocalDate.now)
}

We can already see that this code is easier to comprehend:

val bookingPaymentMapping: Map[BookingId, PaymentId]

but what we do not see yet is the fact that we also introduced the additional “type safety”.

How to get to that second implementation? Let’s do it step by step.

First, we need to create the tags:

trait BookingIdTag

trait PaymentIdTag

These are simple Scala traits but actually other types can be used here as well. However, the trait is most convenient. The names have suffix Tag by convention.

We can use those tags to create tagged types:

import java.time.LocalDate
import shapeless.tag.@@

trait BookingIdTag

trait PaymentIdTag

case class Booking(id: String @@ BookingIdTag, date: LocalDate)

case class Payment(id: String @@ PaymentIdTag, bookingId: String @@ BookingIdTag, date: LocalDate)

object TaggedTypes {
  val booking1 = Booking("bookingId1", LocalDate.now)
  val booking2 = Booking("bookingId2", LocalDate.now)

  val payment1 = Payment("paymentId1", booking1.id, LocalDate.now)
  val payment2 = Payment("paymentId2", booking2.id, LocalDate.now)

  val bookingPaymentMapping: Map[String @@ BookingIdTag, String @@ PaymentIdTag] = Map(booking1.id -> payment1.id, booking2.id -> payment2.id)
}

object Payments {
  def payBooking(bookingId: String @@ BookingIdTag) = Payment("paymentId", bookingId, LocalDate.now)
}

This is basically how we tag the types. We say what type (String, Int, etc.) is tagged by which tag (String @@ StringTag, Int @@ IdTag, etc.).

But with this code we are still a bit far from our desired implementation. It is clear that these parts are boilerplate:

String @@ BookingIdTag

String @@ PaymentIdTag

We can easily replace them with type aliases (also presented in the previous post):

trait BookingIdTag

trait PaymentIdTag

package object tags {
  type BookingId = String @@ BookingIdTag
  type PaymentId = String @@ PaymentIdTag
}

case class Booking(id: BookingId, date: LocalDate)

case class Payment(id: PaymentId, bookingId: BookingId, date: LocalDate)

object TaggedTypes {

  val booking1 = Booking("bookingId1", LocalDate.now)
  val booking2 = Booking("bookingId2", LocalDate.now)

  val payment1 = Payment("paymentId1", booking1.id, LocalDate.now)
  val payment2 = Payment("paymentId2", booking2.id, LocalDate.now)

  val bookingPaymentMapping: Map[BookingId, PaymentId] = Map(booking1.id -> payment1.id,
    booking2.id -> payment2.id)
}

object Payments {
  def payBooking(bookingId: BookingId) = Payment("paymentId", bookingId, LocalDate.now)
}

With this implementation, we are very close to what we would expect but this code still does not compile:

[error] /Users/Damian/local_repos/scala-tagged-types/src/main/scala/com/dblazejewski/taggedtypes/TaggedTypes.scala:23: type mismatch;
[error]  found   : String("bookingId1")
[error]  required: com.dblazejewski.taggedtypes.tags.BookingId
[error]     (which expands to)  String with shapeless.tag.Tagged[com.dblazejewski.taggedtypes.BookingIdTag]
[error]   val booking1 = Booking("bookingId1", LocalDate.now)

This says that we are using String type in the parameter which is expected to be a tagged type:

val booking1 = Booking("bookingId1", LocalDate.now)

The constructor (apply() method) of Booking case class expects tagged type but we supplied it with simple String. To fix this we need to make sure that we create the instance of the tagged type. This is how it can be done:

import com.dblazejewski.taggedtypes.tags.{BookingId, PaymentId}
import shapeless.tag.@@
import shapeless.tag

trait BookingIdTag

trait PaymentIdTag

package object tags {
  type BookingId = String @@ BookingIdTag
  type PaymentId = String @@ PaymentIdTag
}

case class Booking(id: BookingId, date: LocalDate)

case class Payment(id: PaymentId, bookingId: BookingId, date: LocalDate)

object TaggedTypes {
  val bookingId1: BookingId = tag[BookingIdTag][String]("bookingId1")
  val bookingId2: BookingId = tag[BookingIdTag][String]("bookingId2")


  val paymentId: PaymentId = tag[PaymentIdTag][String]("paymentId")
  val paymentId1: PaymentId = tag[PaymentIdTag][String]("paymentId1")
  val paymentId2: PaymentId = tag[PaymentIdTag][String]("paymentId2")


  val booking1 = Booking(bookingId1, LocalDate.now)
  val booking2 = Booking(bookingId1, LocalDate.now)

  val payment1 = Payment(paymentId1, booking1.id, LocalDate.now)
  val payment2 = Payment(paymentId2, booking2.id, LocalDate.now)

  val bookingPaymentMapping: Map[BookingId, PaymentId] = Map(booking1.id -> payment1.id,
    booking2.id -> payment2.id)
}

object Payments {

  import TaggedTypes._

  def payBooking(bookingId: BookingId) = Payment(paymentId, bookingId, LocalDate.now)
}

This is how we defined instances of tagged types:

val bookingId1: BookingId = tag[BookingIdTag][String]("bookingId1")
val bookingId2: BookingId = tag[BookingIdTag][String]("bookingId2")

Now the code compiles.

The code is also on the github.

Let summarize what we achieved here:

the intention of the code is clearly visible:

val bookingPaymentMapping: Map[BookingId, PaymentId]

We know immediately that the bookingPaymentMapping maps booking ids to payment ids.

we get errors in the compilation time when we accidentally switch the ids:

val bookingPaymentMapping: Map[BookingId, PaymentId] = Map(booking1.id -> payment1.id,<br>  payment2.id -> booking2.id)

The examples presented in this post are trivial but even though we see the clear benefits of using tagged types. Imagine the complex project and I think we are fully convinced that this is really usefuly technique for every Scala developer toolset.

Scala – a few useful tips

I’ve been doing programming in Scala for more than 2 years already. I think I can position myself somewhere here:

From time to time I look at Scala Exercises or similar pages where I usually find some things that I do not know yet.

In this blog I would like to share some of the Scala features I found interesting recently.

Collections

Let’s start with Scala Collections framework. Scala Collections is the framework/library which is a very good example of applying Don’t Repeat Yourself principle. The aim was to design the collections library that avoids code duplication as much as possible. Therefore most operations are defined in collection templates. The templates can be easily and flexibly inherited from individual base classes and implementations.

Avoid creating temporary collections

val seq = "first" :: "second" :: "last" :: Nil

val f: (String) => Option[Int] = ???
val g: (Option[Int]) => Seq[Int] = ???
val h: (Int) => Boolean = ???

seq.map(f).flatMap(g).filter(h).reduce(???)

In this sequence of operations a temporary, intermediate collection is created. We do not need those collections implicitly. They only unnecessarily take heap space and burden the GC. The question is why those intermediate collections are created. The answer is because the collections (except Stream) transformers like a map, flatMap, etc. are “strict”. It means that the new collection is always constructed as a result of the transformer. Obviously, there are also “non-strict” transformers available (lazyMap). But how to avoid creating those temporary collections why still using the “strict” transformers? There is a systematic way to turn every collection into a lazy one which is a view. This is special kind of collection that implements all transformers in a lazy way:

val seq = "first" :: "second" :: "last" :: Nil

val f: (String) => Option[Int] = ???
val g: (Option[Int]) => Seq[Int] = ???
val h: (Int) => Boolean = ???

seq.view.map(f).flatMap(g).filter(h).reduce(???)

Now the temporary collections are not created and elements are not stored in the memory.

It is also possible to use views when instead of reducing to a single element the new collection of the same type is created. However, in this case, the force method call is needed:

val seq = "first" :: "second" :: "last" :: Nil

val f: (String) => Option[Int] = ???
val g: (Option[Int]) => Seq[Int] = ???
val h: (Int) => Boolean = ???

seq.view.map(f).flatMap(g).filter(h).force

When transformation creates a collection of different type, instead of force method the suitable converted can be used:

val seq = "first" :: "second" :: "last" :: Nil

val f: (String) => Option[Int] = ???
val g: (Option[Int]) => Seq[Int] = ???
val h: (Int) => Boolean = ???

seq.view.map(f).flatMap(g).filter(h).toList

Calling toSeq on a “non-strict” collection

When Seq(…) is used the new “strict” collection is created. It might look obvious that when we call toSeq on a “non-strict” collection (Stream, Iterator) we create a “strict” Seq. However, when toSeq is called actually we call TraversableOnce.toSeq which returns Stream under the hood which is “lazy” collection. This may lead to hard to track bugs or performance issues.

val source = Source.fromFile("file.txt")
val lines = source.getLines.toSeq
source.close()
lines.foreach(println)

The code seems to look good however when we run it throws IOException complaining that the stream is already closed. Based on what we said above it makes sense since the toSeq call does not create a new “strict” collection but rather returns the Stream. The solution is to either to call toStream implicitly or if we need a strict collection we should use toVector instead of toSeq.

Never-ending Traversable

Every collection in Scala is Traversable. Traversables among many useful operations (map, flatMap) have also functions to get the information about the collection size: isEmpty, nonEmpty, size.

Generally, when we call size on the Traversable we expect the get the number of elements in the collection. When we do something like this:

List(1, 2, 3).size

we get 3, indeed.

Imaging that our API function accepts any Traversable and the user provides us with Stream.from(1) value. Stream is also Traversable but the difference is that this is one of the collections which is lazy by default. As a result, it does not have a definite size.

So when we call

Stream.from(1).size

this method never returns. It is definitely not what we expect.

Luckily, we have a method hasDefiniteSize which says if it is safe to call size on the Traversable.

Stream.from(1).hasDefiniteSize //false
List(1, 2).hasDefiniteSize //true

One thing to remember is that if hasDefiniteSize returns true, it means that the collection is finite for sure. However, the other way around is not always guaranteed:

Stream.from(1).take(5).hasDefiniteSize //fasle
Stream.from(1).take(5).size //5

Difference, intersection, union

This example is self-explanatory:

val numbers1 = Seq(1, 2, 3, 4, 5, 6) 
val numbers2 = Seq(4, 5, 6, 7, 8, 9) 

numbers1.diff(numbers2) 
List(1, 2, 3): scala.collection.Se

numbers1.intersect(numbers2) 
List(4, 5, 6): scala.collection.Seq

numbers1.union(numbers2)
List(1, 2, 3, 4, 5, 6, 4, 5, 6, 7, 8, 9): scala.collection.Seq

The union, however, keeps duplicates. It’s not what we want most of the times. In order to get rid of them we have a distinct function:

val numbers1 = Seq(1, 2, 3, 4, 5, 6) 
val numbers2 = Seq(4, 5, 6, 7, 8, 9) 

numbers1.union(numbers2).distinct
List(1, 2, 3, 4, 5, 6, 7, 8, 9): scala.collection.Seq

Collection types performance implications

Seq

Seq most of the available operations are linear which means they will take time proportional to the collection size (L ~ O(n)). For example, append operation will take linear time on Seq which is not what we would expect. What is more, it means that if we have an infinite collection, some of the linear operations will not terminate. On the other hand, head or tail operations are very efficient.

List

Time Complexity	Description
C	The operation takes (fast) constant time.
L	The operation is linear meaning it takes time proportional to the collection size.

Operation	Time Complexity
head	C ~ O(1)
tail	C ~ O(1)
apply	L ~ O(n)
update	L ~ O(n)
prepend	C ~ O(1)
append	L ~ O(n)

Head, tail and prepend operations take constant time which means they are fast and do not depend on the collection size.

Vector

Vector performance is generally very good:

Operation	Time Complexity
head	eC ~ O(1)
tail	eC ~ O(1)
apply	eC ~ O(1)
update	eC ~ O(1)
prepend	eC ~ O(1)
append	eC ~ O(1)

head or tail operations are slower than on the List but not by much.

Conclusion

Always use a right tool for the job. Knowing the performance characteristics of different collection types we can choose the one that is the fastest for the kind of operations we do.

Type aliases

Type aliases in Scala allow us to create an alternate name for the type and (sometimes) for its companion object. Usually, we use them to create a simple alias for a more complex type.

type Matrix = List[List[Int]]

However, type aliases can be also helpful for API usability. When our API refers to some external types:

import spray.http.ContentType

final case class ReturnValue (data: String, contentType: ContentType)

We always force users of our API to import those types:

import spray.http.ContentType

val value = ReturnValue("data", ContentType.`application/json`)

By defining the type alias in the base package we can give users the dependencies for free:

package com.dblazejewski

package object types {
    type ContentType = spray.http.ContentType
}

import com.dblazejewski._

val value = ReturnValue("data", ContentType.`application/json`)

Another use case that comes to my mind is simplifications of type signatures:

def authenticate[T](auth: RequestContext => Future[Either[ErrorMessage, T]]) = ???

By introducing two type aliases:

package object authentication {
    type AuthResult[T] = Either[ErrorMessage, T]
    type Authenticator[T] = RequestContext => Future[AuthResult[T]]
}

We hide the complexity:

def authenticate[T](auth: Authenticator[T]) = ???

Actually, scala.Predef is full of type aliases:

Auto lifted partial functions

A partial function PartialFunction[A, B] is a function defined for some subset of domain A. The subset is defined by the isDefined method.

A partial function PartialFunction[A, B] can be lifted into a function Function[A, Option[B]]. The lifted function is defined over the whole domain A but the values of type Option[B]

Example:

val pf: PartialFunction[Int, Boolean] = { 
  case i if i > 0 => i % 2 == 0
}

val liftedF = pf.lift

liftedF(-1) 
//None: scala.Option

liftedF(1)
//Some(false): scala.Option

Thanks to the lifting instead of doing sth like this:

future.map { result => result match {
    case Foo(foo) => ???
    case Bar(bar) => ???
}

Scala allows us to do it in a simpler way:

future.map {
    case Foo(foo) => ???
    case Bar(bar) => ???
}

ImplicitNotFound

From the Scala docs:

class implicitNotFound extends Annotation with StaticAnnotation

An annotation that specifies the error message that is emitted when the compiler cannot find an implicit value of the annotated type.

Let’s look at the example:

trait Serializer[T] {
  def serialize(t: T): String
}

trait Deserializer[T] {
  def deserialize(data: String): T
}

def foo[T: Serializer](x: T) = x

foo(42)

When we run this code we get a rather vague error message:

However, when we add a simple implicitNotFound annotation:

import annotation.implicitNotFound

@implicitNotFound("Cannot find Serializer type class for type ${T}")
trait Serializer[T] {
  def serialize(t: T): String
}

@implicitNotFound("Cannot find Deserializer type class for type ${T}")
trait Deserializer[T] {
  def deserialize(data: String): T
}

def foo[T: Serializer](x: T) = x

foo(42)
foo("text")

We get more meaningful errors:

Conclusion

Scala is really powerful and expressive language that I like very much. On the other hand, it takes quite some time to get really proficient in it. The tips presented in this post are rather basic but hopefully in the following posts we will dive into more advanced, functional aspects of the Scala language.

It is always a matter of common sense when we develop software in Scala to decide if we keep the code simple enough and easy to understand for other developers. This is very important to remember since using Scala we have the tool to make the code really complicated and not understandable.

With Scala it is even possible to break the GitHub linter: ContentType.scala 🙂

ElasticMQ – the SQS power available locally

Amazon Simple Queue Service (Amazon SQS) is a distributed message queuing service. It is similar to other well-known messaging solutions like RabbitMQ, ActiveMQ, etc. but it is hosted by Amazon.
It is a really fast, configurable and relatively simple messaging solution.

In my current company we strongly rely on the AWS infrastructure. One major Amazon cloud component we use is the SQS (Simple Queue Service).
It allows us to decouple components in the application. We also send lots of notification through SQS which makes handling them reliable.

In short words, SQS works perfectly for us. The only issue we had was running the application in development mode which means running it locally without the need to integrate with the AWS infrastructure. The same problem arises in the integration tests.
The best solution would be to have the SQS running locally or even better have the service that can be embedded into the application.

And here Adam Warski and his ElasticMQ comes to the rescue.
ElasticMQ is a message queue system that can be run either stand-alone or embedded.
It has Amazon-SQS compatible interface which means implementing some of the SQS query API
(I am not sure if the full API is implemented but all the standard queries are there).

As already mentioned, ElasticMQ can be run in two ways:

1. Stand-alone service

It is as simple as running the command:

java -Dconfig.file=custom.conf -jar elasticmq-server-0.10.0.jar

Then the application needs to be configured with the proper SQS url, for example:

http://localhost:9324/queue/wl-test-queue

2. Embedded mode

This mode is perfect for integrating the ElasticMQ into the application running on the developer’s station or run it during the integration tests.

First step is to setup the ElasticMQ server. We use Play 2.4 and Guice as the dependency injection framework.
In order to start the ElasticMQ when the application starts I simply extend Guice AbstractModule:

import com.google.inject.AbstractModule
class StartupModule extends AbstractModule {
   override def configure() = {
      bind(classOf[ElasticMqLauncher]).to(classOf[ElasticMqLauncherImpl]).asEagerSingleton()
   }
}

The launcher itself starts the ElasticMQ server:

val config = ConfigFactory.load()
val server = new ElasticMQServer(new ElasticMQServerConfig(config))
val shutdown = server.start()

Having the ElasticMQ server running that way and changing only the SQS queue url in application properties we can run the infrastructure depending on the SQS locally without need to communicate with the SQS cloud service.

How to make your Maven build fast again

When the project starts we do not think about the certain things like dependency hell, modularisation or time needed to build the project. It is quite normal, the codebase is pretty small, there are not that many dependencies, etc.
However when the time is passing, the time spend on building the application starts to be noticeable.
It is high time to think how to cut down on the build time.
Here you can find a few tips which could help you in this field.

Build in parallel

By default Maven builds the project sequentially. Modern computers have at least a few logical cores which are not fully utilised during Maven build. The Maven is able to analyze the dependency graph and build the project in parallel if possible.

mvn -T 4 clean install

It is telling the Maven to use 4 threads when building the project.

The time that can be saved depends on the project but on every advice you have an example times (MacBook Pro 2,3 GHz Intel Core i5).

one module project:

mvn clean install

Total time: 01:16 min

mvn -T 4 clean install

Total time: 01:04 min

multi module project:

mvn clean install

Total time: 02:44 min

mvn -T 4 clean install

Total time: 01:44 min

Build incrementally

The way we usually build projects is:

mvn clean install

But do we really need to clean the project every time? Obviously cleaning is one of the first things to try when we face weird caching problems or some strange bugs. However, generally we do not need to clean the project. From the performance point of view it is better to build it incrementally:

mvn install

mvn install -pl module_name -am

which builds only module_name and dependent modules.

multi module project:

mvn clean install

Total time: 02:32 min

mvn -T 4 install

Total time: 51.396 s

Build offline

This is not only Maven problem that it seems to try to download the whole Internet every time you build the application. Other build tools like sbt, gradle or npm do the same.
We can prevent Maven from trying to download dependencies from the Internet using offline option;

mvn --offline install

multi module project:

mvn clean install

Total time: 02:28 min

mvn --offline clean install

Total time: 02:21 min

JVM tuning

Since Maven is normal Java program we can try to do some Java tuning in order to speed up the build.
One possible optimization would be to reduce JIT (Just In Time) compilation. Basically, JIT compiler runs after the Java program started and compiles the code on the fly into a form that is usually faster to run on the particular CPU. While it is good for a long running programs, it is not necessarily helpful for a short running programs like Maven build.
We can try to force the JVM to make only basic JIT compilation:

-XX:+TieredCompilation -XX:TieredStopAtLevel=1

multi module project:

mvn clean install

Total time: 02:28 min

export MAVEN_OPTS="-XX:+TieredCompilation -XX:TieredStopAtLevel=1"; 
mvn clean install

Total time: 01:44 min

Run tests in parallel

We know how much time it takes to execute all those unit and integration tests during the build. These days tests execution time takes most of the time spend in Maven build.
To reduce that time one possible option is to use that infamous skipTests=true option. I do not advise this practise. However there is another way to speed up the tests: run them in parallel.
The maven-surefire-plugin has an configuration option to run the tests in parallel:

parallel>classes</parallel>
<threadcount>10</threadcount>

multi module project:

mvn clean install

Total time: 02:40 min

#tests running in parallel (10 threads used);
mvn clean install

Total time: 02:25 min

Conclusion

In this post we looked at a few basic ways to speed up a Maven build. Not all of them can be used in every project but you can pick those which fit into your model and save a few seconds on the Maven build.

Goldman Sachs collections – nearly everything you want from collections in Java

Java collection framework is not that powerful as experienced Java developer would expect.
For example, how do you sort a list?
Simple answer would be to use java.util.Collections.sort() method with some kind of java.util.Comparator implementation. Additionally Guava Ordering support can be used.
However, the solution is not exactly what object oriented developer looks for.
Similarly to sorting a collection you would probably deal with finding min or max element in a collection using java.util.Collections.min() and java.util.Collections.max() methods respectively.
After all how to filter a collection? Or how to select a list of particular property extracted from the objects stored in the collection? It can be done in pure Java using a for loop, using Apache Commons Collections and its CollectionUtils.filter(), CollectionUtils.collect() or Guava Collections2.filter(). Nonetheless, still none of those solutions is fully satisfying from my point of view.
Of course, there is Java 8 in the game, but it is a quite new release that cannot be used in every project, especially in legacy one and its collection framework is still not optimal.

As a rescue for the above problems the Goldman Sachs Collections (GS Collections) framework comes in. It is a collection framework that Goldman Sachs open sourced in January 2012.

Here is quick feature overview of GS Collections comparing to Java 8, Guava, Trove and Scala:

Seeing this, even if you thought that Java 8 had everything you need from collections, you still should have a look at GS Collections.

Following this brief introduction I am going to present a quick overview of the main features GS Collections has to offer. Some of the examples are variants of the exercises in the GS Collections Kata which is a training class they use in Goldman Sachs to train developers how to use GS Collections. The training is also open sourced as a separate repository.

Going back to the example from the beginning of the post, it would be perfect if we have methods like sort(), min(), max(), select(), collect(), etc. on every collection. It is simple to put them in a util class but it does not reflect the object oriented design.

GS Collections has an interfaces accomplishing this in the following way (as an example):

public interface MutableList<T> extends List<T>{
    MutableList<T> sortThis(Comparator<? super T> comparator);
    <V> MutableList<V> collect(Function<? super T, ? extends V> function);
    MutableList<T> select(Predicate<? super T> predicate);
    ...
}

GS Collections classes do not extend Java Collection Framework classes. They are instead new implementation of both Java Collection Framework and GS Collections interfaces.

Collect pattern

The collect patterns returns a new collection where each element has been transformed. An example can be a case when we need to return price of each item in the shopping cart.

Collect pattern uses function which takes an object and returns an object of a different type. It simply transforms objects.

MutableList<Customer> customers = company.getCustomers();
MutableList<String> customerCities = customers.collect(new Function<Customer, String>() {
   @Override
    public String valueOf(Customer customer) {
     return customer.getCity();
    }
});

or using Java 8 lambda expressions:

MutableList<Customer> customers = company.getCustomers();
MutableList<String> customerCities = customers.collect(customer->customer.getCity());

or using method reference:

MutableList<String>customerCities=customers.collect(Customer::getCity);

Select pattern

The select pattern (aka filter) returns the elements of a collection that satisfy some condition. For example select only those customers who live in London. The pattern uses predicate which is a type taking an object and returning a boolean.

MutableList<Customer> customers = company.getCustomers();
MutableList<Customer> customersFromLondon = customers.select(new Predicate<Customer>() {
  @Override
  public boolean accept(Customer each) {
    return each.getCity().equalsIgnoreCase("London");
  }
});

or using Java 8 lambda expressions:

MutableList<Customer> customers = this.company.getCustomers();
MutableList<Customer> customersFromLondon = customers.select(
each -> each.getCity().equalsIgnoreCase("London"));

Reject pattern

The reject pattern returns the collection elements that do not satisfy the Predicate.

MutableList<Customer> customersNotFromLondon = this.company.getCustomers()
.reject(new Predicate<Customer>() {
    @Override
    public boolean accept(Customer each) {
      return each.getCity().equalsIgnoreCase("London");
        }
});

One note in regards to anonymous inner classes when it is not possible to use Java 8. It is advisable to encapsulate them in the domain object and then the above snippet changes into:

MutableList<Customer> customersNotFromLondon = this.company.getCustomers()
    .reject(Customer.CUSTOMERS_FROM_LONDON);

Other patterns using Predicate

Count pattern
- Returns the number of elements that satisfy the Predicate.
Detect pattern
- Finds the first element that satisfies the Predicate.
Any Satisfy
- Returns true if any element satisfies the Predicate.
All Satisfy
- Returns true if all elements satisfy the Predicate.

Testing

GS Collections includes helpful, collections-specific utilities for writing unit tests. There are implemented as extension of JUnit.

Instead of checking the collections size:

Assert.assertEquals(2, customersFromLondon.size());

you can use:

Verify.assertSize(2, customersFromLondon)
MutableList<Integer> list = FastList.newListWith(1, 2, 0, -1);
Verify.assertAllSatisfy(list, IntegerPredicates.isPositive());

Some more examples:

Verify.assertEmpty(customersFromLondon);&nbsp;
Verify.assertNotEmpty(customersFromLondon);&nbsp;
Verify.assertContains(customer,&nbsp;customersFromLondon);
Verify.assertContainsAll(customersFromLondon, customer1, customer2, customer3);

Predicates

GS Collections provides several built-in predicates:

MutableList<Integer> mutableList = FastList.newListWith(25, 50, 75, 100);&nbsp;
MutableList<Integer> selected = mutableList.select(Predicates.greaterThan(50));
MutableList<Person> theLondoners = people.select( Predicates.attributeEqual(
Person::getCity, "London"));

Immutability

I personally prefer immutable data structures to mutable ones. The pros are that they can be pass around without making defensive copies, they can be concurrently accessed without possibility of corruption, etc.

Methods toList(), toSortedList(), toSet(), toSortedSet(), toBag() always return new, mutable copies.

MutableList<Integer> list = FastList.newListWith(3, 1, 2, 2, 1);&nbsp;
MutableList<Integer> noDuplicates = list.toSet().toSortedList();

ImmutableCollection interface does not extend Collection therefore has no mutating methods.

ImmutableList<Integer> immutableList = FastList.newListWith(1, 2, 3).toImmutable();
ImmutableList<Integer> immutableList2 = Lists.immutable.of(1, 2, 3);

Flat collect

Flat collect pattern is a special case of collect pattern. While using a collect pattern when function returns a collection result is a collection of collections. On the other hand, flat collect in this case returns a single “flattened” collections instead of collection of collections.

company.getCustomers.flatCollect(Customer::getOrders);

or in pre-Java 8 way:

company.getCustomers().flatCollect(new Function<Customer, Iterable<Order>>() {
  @Override
  public Iterable<Order> valueOf(Customer customer) {
    return customer.getOrders();
  }
});

Static utilities

As stated in the beginning processing collections using methods on the interfaces is the preferred, object oriented approach. However it is not always feasible. As a solution GS Collections, similarly to JDK, introduces several static utility classes like Iterate, ListIterate, etc.

Some of them can be used to inter operate with Java Collection Framework. What is more, they allow developers to refactor existing code base into the one using GS Collections incrementally.

List<Integer> list = ...;
MutableList<Integer> selected = ListIterate.select(list, Predicates.greaterThan(50));

Integer[] array = ...;
MutableList<Integer> selected = ArrayIterate.select(array, Predicates.greaterThan(50));

String result= StringIterate.select( "1a2a3", CharPredicate.IS_DIGIT);
Assert.assertEquals("123",&nbsp;result);

Parralel iteration

GS Collections provides static utility for parallel iteration which can be used for data-intensive algorithms. It looks like the serial case, hiding complexity of writing concurrent code.

List<Integer> list = ...;
Collection<Integer> selected = ParallelIterate.select(list, Predicates.greaterThan(50));

Remember that parallel algorithms are not usually a solution for performance problems.

FastList as a replacement for ArrayList

FastList is considered a drop-in replacement for ArrayList. It is definitely more memory efficient and can be used to refactor legacy code in steps.

Let’s refactor that simple piece of code using GS Collections:

List<Integer> integers = new ArrayList<>();
integers.add(1);
integers.add(2);
integers.add(3);

Step 1:

List<Integer> integers = new FastList<Integer>();
integers.add(1);
integers.add(2);
integers.add(3);

Step 2:

List<Integer> integers = FastList.newList();
integers.add(1);
integers.add(2);
integers.add(3);

Step 3:

List<Integer> integers = FastList.newListWith(1, 2, 3);

or if you need unmodifiable collection:

List<Integer> integers = FastList.newListWith(1, 2, 3).asUnmodifable();

Step 4:

MutableList<Integer> integers = FastList.newListWith(1, 2, 3);

The analogous refactorings can be carried out for maps and sets using respectively UnifiedMap and UnifiedSet.

UnifiedMap<Integer, String> map = 
   UnifiedMap.newWithKeysValues( 1, "1", 2, "2", 3, "3");

Parallel lazy evaluation

There are situation when first optimization which comes to the mind is to parallel operations. It can be justified especially in processing large chunks of data like collections of millions elements in multi -processor environment. GS Collections offers a functionality to implement it in a friendly way:

MutableList<Item> data = ...;
ExecutorService executorService = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors();
ParallelListIterable<Item> itemsLazy = FastList.newList(data).asParallel(executorService, 50000);
itemsLazy.forEach(Item::calculateNewPrice);

asParallel() method takes two parameters:

executorService
batchSize which determines the number of elements from the backing collection that get processed by each task submitted to the thread pool; from my experience the appropriate batch size has significant influence on performance and should be determined during performance tests

Performance

I did personally a few performance tests comparing lazy and parallel lazy evaluations using GS Collections but I did not do any comparison between GS Collections and other collections framework. Since Goldman Sachs promises that their implementation is optimized for performance and memory usage I tried to find any tests that prove that.

Here is an example comparison of GS Collections, Java 8 Collections and Scala Collections:

Summary

This is just a tip of the iceberg in regards of GS collections. The framework offers much more like support for stack data structure (MutableStack), bag data structure (MutableBag), multimaps (MutableListMultimap), grouping functionalities (groupBy, groupByEach), lazy evaluation (asLazy()).

From my point of view it is a quality replacement for current Java Collections Framework.

Intellij Idea without a mouse

I decided to write this post after watching Hadi Hariri’s talk at Geekout 2014 called “Mouseless Driven Development”
According to Hadi, it is possible to code in Intellij Idea without even touching a mouse. I can believe it.
In the post I am going to present some nice tricks which I use on my daily work with Intellij Idea and the new ones that I learned during the talk.
Writing this post I was using Intellij Idea 13 so some of the shortcuts may not be available in the older versions.

CTRL+N – brings a class name finder. You can type camel case and it finds the appropriate class. However it is also possible to put colon after the class name followed by a number and it takes you to the class and goes to the line specified by the number entered.

CTRL+SHIFT+N – brings a file name finder which you can use in the same way as CTRL+N.
CTRL+SHIFT+ALT+N – brings a symbol finder.

As you can see there are a few consumeFloat() methods and if you want to go to the particular symbol name you enter class name before the symbol:

double SHIFT – brings a “Search Everywhere” window, search not only classes and files but you can look for some settings.

ALT+1 – goes to Project Explorer and like in any other window or toolbar you can actually type letters and that finds an appropriate item.

CTRL+F12 – brings a popup which allows to navigate to different symbols/methods in the current file or class.
CTRL+E – brings a popup with the recent opened files.
CTRL+SHIFT+E – brings a popup with the recent edited files.

As far as two above shortcuts are considered I would like to add some more information. Particularly, they allowed me to completely get rid of tabs in Intellij Idea. I realized that using the shortcuts I do not need them any more. They have been only introducing a mess since nearly every time I needed to change a tab I had to look for a specific one among many of them currently opened. When I started using lists of recent and recently edited files going to previous files got far more easier.

CTRL+TAB – allows you to switch to different windows, etc.
CTRL+B – when cursor is placed on the item in the class, it takes you to the declaration.
CTRL+ALT+LEFT/RIGHT – allows you to go back and forward where you have been before.
CTRL+SHIFT+I – brings a quick popup window with an item definition.

CTRL+ALT+B – goes to the implementation if there is one or displays pop up with all implementations available.

When navigating between files in the project explorer with UP/DOWN keys you can press ENTER and it shows a current item in the editor without moving focus to the editor. However, F4 pressed opens item in the editor and moves focus to that editor. Very efficient way to navigate through project without touching a mouse.

CTRL+ALT+F7 – brings a quick pop up showing usages of the current item in the project.
CTRL+SHIFT+F7 – highlights usages of the selected item; when used on the throw it highlights all the places in the current method where the exception is thrown; when used on the return statement it shows all the places where the method exits.
CTRL+SHIFT+F12 – hides all opened windows and maximizes the editor, entering shortcut again brings the closed windows back
CTRL+SHIFT+LEFT/RIGHT – entered when you are in the project explorer it resizes the editor; very nice feature – give it a try.
CTRL+W – expands context sensitive selection.
SHIFT+ALT+UP/DOWN – moves text up and down; it is context sensitive so where you place cursor on for loop for instance it moves up and down the whole loop.
CTRL+SHIFT+ENTER – intelligently completes current statement (for example adds semicolon at the end of the statement).
CTRL+ALT+T – brings a “Surround With” popup which gives you a few options of code insertions; works not only for Java but for other technologies (HTML, etc.) as well.

CTRL+SHIFT+SPACE – smart completion which filters the list of available methods or variables to match the expected type.

What we still do, even if they are better ways to handle such cases, is checking if a variable is not null.

In Intellij Idea you do not have to write the same statements again an again. Having an object variable, put dot and the end and press CTRL+SPACE or CTRL+SHIFT+SPACE or CTRL + J and on the list there is an option “notnull” which generates that cumbersome statement for you.

CTRL+J – inserts a live template; the shortcut is context sensitive and gives you different options when pressed on the variable or for example in an empty space in the editor; even if you not defined your own templates, those built in are worth looking at.

CTRL+SHIFT+ALT+T – brings a refactor popup; unfortunately it is not context sensitive.
SHIFT+F2/F2 – goes to the next/previous highlighted error, however it works that way only if the following option is checked in:

ALT+F12 – opens operating system terminal inside IDE.

After the 33rd Degree 2014 – main tools/techniques to trial or adopt

Every conference I have attended so far left me with new ideas and motivation to do things better on daily basis. This year 33rd Degree has given me the same boost as the previous ones.

This blog post is a quick overview of tools/techniques I am going to investigate deeper after the conference and try to introduce at least some of them at my daily work.

Project Lombok

boilerplate code generator
some features:

@NonNull annotation – generates null check for you
@Cleanup annotation – to ensure that a given resource is automatically cleaned up
@Getter / @Setter annotations- for every annotated field generates standard getter/setter methods
@Log annotation – for every class with the annotation generates logger field

Logback

I have used it in one project but forget about it and stick to log4j
about 10 times faster than log4j
automatic configuration files reloading
automatic old log files compression and removal
maybe wait for Apache Log4j 2 ?

Logstash

a brilliant tool for managing application logs
provides web interface

vert.x

light, robust and high performance application platform for JVM
simple asynchronous API similar to the node.js one
project inspired by node.js so both have many similarities
components can be developed in different languages: Java, JavaScript, Groovy, …
scalability and high availability support out of the box

application metrics

start providing application metrics
a nice tool from codahale
already implemented it in Java application – fulfils all expectations

jooq

lets build typesafe SQL queries
fluent API
code completion for SQL
bugs found on compilation time
paid for use with non open source databases

Spring Boot

another excellent tool from Pivotal/SpringSource
quick application bootstrap
embeds Tomcat or Jetty
features such as metrics or health checks provided out of the box
no code generation

Modern Java applications deployment

awesome quote from +Axel Fontaine : “Running servers in production should be like going backpacking. You take the bare minimum with you. Anything else is going to hurt”
environment promotion instead of setting up a clean environment on development/test/production – that idea is not new for me. I have already seen it and use it on the Mainframe infrastructure.
embedded containers

Nashorn

JavaScript engine for JVM
amazing two-way integration with Java

walkmod

a tool to enforce consistent code conventions among project contributors

code licence snippets
code formatting
automatic refactoring – but I would think deeper before using it

Measure and find bottlenecks before they affect your users 1/2

Inspired by a few talks during the last 33rd Degree conference I decided to implement, in one of the applications I develop, metrics which allow developers or operation team monitor running application and possibly detect potential problems early on.

After a quick investigation I decided to use the Metrics framework. What I expected was exposing at least some statistics concerning usage of application components. First of all I would use them during application stress tests to find particular components slowing down the application. After going into production, I imagine that such statistics would be helpful to monitor the running application.

Metrics framework perfectly fits my expectations. It is a Java framework. What is more there are Javascript ports which I am going to use to monitor Node.js server (more about it in one of the next posts).
I decided to integrate the tool with Spring application context using metrics-spring but of course it is possible to use it without Spring.

Here is the Spring application context with Metrics support:

<?xml version="1.0" encoding="UTF-8"?>
<beans
    xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:context="http://www.springframework.org/schema/context"
    xmlns:mvc="http://www.springframework.org/schema/mvc"
    xmlns:metrics="http://www.ryantenney.com/schema/metrics"   xsi:schemaLocation="http://www.springframework.org/schema/beans    http://www.springframework.org/schema/beans/spring-beans-3.2.xsd    http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd    http://www.springframework.org/schema/mvc http://www.springframework.org/schema/mvc/spring-mvc.xsd    http://www.ryantenney.com/schema/metrics http://www.ryantenney.com/schema/metrics/metrics-3.0.xsd">
    <metrics:metric-registry id="metrics"/>
    <metrics:annotation-driven metric-registry="metrics"/>
    <metrics:reporter id="metricsJmxReporter" metric-registry="metrics" type="jmx"/>
    <metrics:reporter id="metricsLogReporter" metric-registry="metrics" type="slf4j" period="1m"/>
</beans>

The configuration defining a few beans:

metrics-registry is the bean used to register generated metrics; its explicit definition is optional, if not defined new MetricRegistry bean is created
annotation-driven element tells that annotations are used to mark methods/beans under monitoring
reporter element is used to report gathered statistics to the defined consumers; there are a few reporter implementation provided (jmx, console, slf4j, ganglia, graphite); I decided to use two of them:
- jmx (JmxReporter) exposing metrics as JMX MBeans; they can be explored using standard tools like jconsole or VisualVM
- slf4j (Slf4jReporter) logging metrics to an SLF4J logger; period attribute defines the interval used to report statistics to a log file

When configuration is done, it is time to annotate the bean methods which are to be monitored. To do that there is a simple @Timed annotation provided:

@RequestMapping(value = "/transaction", method = RequestMethod.POST)
@ResponseBody
@Timed
public HttpEntity<SubmitResultDTO> transaction(@RequestBody NewTransactionDTO transaction) { ...}

Using that simple configuration you get JMX MBeans exposed providing a nice set of metrics:

What is more, if any statistic is clicked an online chart is presented:

Besides the JMX reporter there is also the SLF4J reported defined, which logs the following pieces of information:

Except JMX or SLF4J reporting more sophisticated tools can be used to consume statistics provided my Metrics. I would recommend trying Ganglia or Graphite as there are reporters provided for those consumers (GangliaReporter and GraphiteReporter).