Event Streams in Action

Between 2009 and 2013, I published ten book reviews on this blog. And since then, nothing. Reading a book is a huge commitment, not to mention the review.

During the lockdown, Manning approached me for a "partnership opportunity". In general, I turn down such offers. But I already bought and read books from Manning in the past: they range from above-average to good reads.

I proposed to amend the deal like this: Manning sends a book of my choice for free and I write an honest review. The publisher has a chance to approve it or to discard it. If it's approved, then I don't change a single comma.

For my first review after a long hibernation, I set my eyes on Event Streams in Action as my current job involves Hazelcast Jet, a stream processing engine.

Facts

11 chapters, $26.99
As the name implies, the book is about Event Streams

Chapters

Here's a rapid sum-up of each chapter:

Explores the concept behind Event Streams and their unbounded nature
Describes the properties of a unified log: unique, append-only, distributed, and ordered
Introduces Apache Kafka
Introduces Amazon Kinesis
Describes stateful stream processing and illustrates it with an Apache Samza use-case
What happens when code tries to process data it was not meant to? Presents schemas and describes Apache Avro in detail
Archives events in Kafka, with Pinterest Secor
Defines "railway-oriented programming", an approach to elegantly handle failures during processing by modeling them as events
Defines the difference between events and commands
Introduces event streams in analytics. Describes analytics-on-read i.e. dumps all events directly into a datastore
Describes the other part, analytics-on-write where events are processed before being stored

Pros and cons

On the plus side, I liked the following items:

Schemas: the reasons why you should use them - and more importantly, the different approaches to implement them
Archiving events: once you have processed events, what if you need to archive them in long-term storage? What are the tools available?
Error handling: in stream processing, you should handle errors differently than in traditional applications. A pipeline runs indefinitely. Hence, you need to differentiate between recoverable errors and non-recoverable errors. The section enumerates the different options.
Commands and events: some events in the pipeline describe state and others describe actions to execute
The book has a lot of illustrations: this helps a lot the understanding!

On the flip side:

The book tries to cover a lot of different concepts, approaches, techniques, and tools around the world of stream processing. The goal is commendable but it's hard to fit everything into a single book
The book showcases different languages in the code samples: Java, Scala, and Python. While different tools require different languages, both Java and Scala produce bytecode and run on the JVM. I believe using a subset of them would have made the understanding of the sample easier
The book uses two different fictional companies for use-cases and switches between them throughout the book
The use-cases make use of a lot of different tools: Apache Kafka, Apache Kinesis, Apache Samza, Apache Hadoop Yarn, Apache Avro, Pinterest Secor, Apache Spark, Amazon Redshift, Amazon DynamoDB, AWS Lambda, ... Because of that, the explanations about each of them is limited

Wrap-up

Event Streams in Action offers a lot of interesting content. For beginners, it will provide fundamental knowledge but they will probably be at loss getting all the finer points from the use-cases. Already experienced practitioners will benefit from the code examples. Yet, the book tries to cover too much and could be improved by separating between the two personas and expanding the content toward each of them.

Originally published at A Java Geek on December 6th 2020