Kafka Streams : Advanced skills for developing stream processing applications

Price : 1750€ H.T. - per attendee

For more information about this training course, please free to contact :


During this instructor-led three-day hands-on we will learn how to use advanced Kafka Streams API and discuss best-practices for both for development and production.

Course Objectives

You will learn how to use the Kafka Streams DSL API and the low-level Processor API to build stream processing topologies. In addition, you will learn how to use internal stores to implement stateful applications.


50% theory, 50% practise

Who Should Attend ?

This course is designed for applications Developers, Architects, Data engineers who need to build and deploy streaming applications for enrich, transform, join and query data flowing through Apache Kafka in real-time.

Course Duration

3 Days

Course Prerequisites

Attendees should be familiar with developing in Java. Attendees should also be familiar with the core concepts of Apache Kafka.

Course Content

1 ) Go back to basics

The Components of a Kafka cluster
  • Broker, Producer, Consumer
  • Message, Topic , Partitions
  • Zookeeper
  • OS Page-cache
Scalability inside consumer groups
Replication and Fault-Tolerance
  • The roles of brokers (Leader, Follower, Controller)
  • In-Sync Replicas
  • Commit
  • Producer and Message Delivery Reliability
The retention policies (deleted, compacted)

2 ) Introduction to Kafka Streams

Why Kafka Streams ?
Use-Cases & Key Features
Levels of abstraction
Concepts of Streams and Table
Building a simple topology using DSL API
The underlying Processors API
Sub-topologies and Repartitioning topics
Basic configurations

3 ) Architecture and Threading Model

Streams Tasks
Streams Threads
Runtime States
Consumption and processing model
Partitions assignment

4 ) The Processor API

API Overview
Accessing Processor Context
Forwarding records to the downstream processors
Building a topology
Manipulating local states
Punctuate operations (Punctuator API)
Repartitioning operations

5 ) The Streams DSL API and Stateful operations

The API Overview
The API Abstractions & Operations
  • KStream / KGroupedStream
  • KTable /KGroupedTable
  • GlobalKTable
The stateless operations
The aggregate functions
How to combine DSL and Processors (Transformer)
Streams DSL vs Processor API

6 ) The windowed operations and the join semantics

Notions of time
The Window operations
Manipulating Time (TimestampExtractor)
How to manage out-of-order records
How to suppress intermediate aggregate results
The joins - semantics and operations

7 ) Understanding state store mechanisms

Implementations and role of state store
Understanding the read path for a KeyValueStore
Understanding the write path for a KeyValueStore
Standby replicas and state Recovery

8 ) RocksDB

Introduction to RocksDB
Concepts and architecture
Memory management
Metrics and available configurations

9 ) Error Management

The types of errors
How to manage deserialization exceptions
How to manage message production exceptions
Which error handling strategies to choose?

10 ) Testing a Kafka Streams Applications

How to test a topology using the TestTopologyDriver API
How to write integration tests using Embedded Kafka and Dockers

11 ) More Advanced Concepts and Features

The Exactly-Once semantic
The topology optimization
How to name topology operations
Data reprocessing
How to query state stores using interactive queries

12 ) Managing a Kafka Streams application in production

13 ) Infrastructure and Capacity Planning recommendations

14 ) Conclusion

The Author's Avatar

Florian travaille depuis plus de 8 ans dans le conseil, il est co-fondateur et CEO de StreamThoughts. Passionné par les systèmes distribués, il se spécialise dans les technologies d’event-streaming comme Apache Kafka, Apache Pulsar. Aujourd’hui, il accompagne les entreprises dans leur transition vers les architectures orientées streaming d’événements. Florian est certifié Confluent Administrator & Developer pour Apache Kafka. Il est nommé deux années consécutive (2019 et 2020) “Confluent Community Catalyst” pour ses contributions sur le projet Apache Kafka Streams et son implication dans la communauté open-source. Il fait partie des organisateurs du Paris Apache Kafka Meetup.