Kafka Streams

Advanced skills for developing stream processing applications

Price : 1980€ H.T. - per attendee

For more information about this training course, please free to contact :
training@streamthoughts.io

Description

During this instructor-led three-day hands-on we will learn how to use advanced Kafka Streams API and discuss best-practices for both for development and production.

Course Objectives

This course enables participants to acquire the following skills:

Understand how Kafka Streams DSL API and the low-level Processor API work.
Build stream processing topologies.
Create stateful streaming applications using internal state stores.
Optimize Kafka Streams applications for performance.

Pedagogy

50% theory, 50% practise

Who Should Attend ?

This course is designed for applications Developers, Architects, Data engineers who need to build and deploy streaming applications for enrich, transform, join and query data flowing through Apache Kafka in real-time.

Course Duration

3 Days

Course Prerequisites

Attendees should be familiar with developing in Java. Attendees should also be familiar with the core concepts of Apache Kafka.

Course Content

Module 1: Go back to basics

The Components of a Kafka cluster
Broker, Message, Topic & Partitions
Producers Basics
Consumers & Consumer Groups
Replication & Fault-tolerance
Data retention and compression
Understanding Zookeeper’s roles
Understanding Kafka’s performance

Module 2: Introduction to Kafka Streams

Why Kafka Streams ?
Use-Cases & Key Features
Levels of abstraction
Concepts of Streams and Table
Building a simple topology using DSL API
The underlying Processors API
Sub-topologies and Repartitioning topics
Basic configurations

Module 3: Architecture and Threading Model

Streams Tasks
Streams Threads
Runtime States
Consumption and processing model
Partitions assignment

Module 4: The Processor API

API Overview
Accessing Processor Context
Forwarding records to the downstream processors
Building a topology
Manipulating local states
Punctuate operations (Punctuator API)
Repartitioning operations

Module 5: The Streams DSL API and Stateful operations

The API Overview
The API Abstractions & Operations
The stateless operations
The aggregate functions
How to combine DSL and Processors (Transformer)
Streams DSL vs Processor API

Module 6: The windowed operations and the join semantics

Notions of time
The Window operations
Manipulating Time (TimestampExtractor)
How to manage out-of-order records
How to suppress intermediate aggregate results
The joins - semantics and operations

Module 7: Understanding state store mechanisms

Implementations and role of state store
Understanding the read path for a KeyValueStore
Understanding the write path for a KeyValueStore
Standby replicas and state Recovery

Module 8: RocksDB

Introduction to RocksDB
Concepts and architecture
Memory management
Metrics and available configurations

Module 9: Error Management

The types of errors
How to manage deserialization exceptions
How to manage message production exceptions
Which error handling strategies to choose?

Module 10: Testing a Kafka Streams Applications

How to test a topology using the TestTopologyDriver API
How to write integration tests using Embedded Kafka and Dockers

Module 11: More Advanced Concepts and Features

The Exactly-Once semantic
The topology optimization
How to name topology operations
Data reprocessing
How to query state stores using interactive queries

Module 12: Managing a Kafka Streams application in production

Module 13: Infrastructure and Capacity Planning recommendations

Module 14: Conclusion

Instructor

Florian Hussonnois

Florian travaille depuis plus de 8 ans dans le conseil, il est co-fondateur et CEO de StreamThoughts. Au cours de sa carrière, il a travaillé sur divers projets impliquant la mise en oeuvre de plateformes d’intégration et de traitement de la data à travers les mondes technologiques de Hadoop et de Spark. Passionné par les systèmes distribués, il se spécialise dans les technologies d’event-streaming comme Apache Kafka, Apache Pulsar. Aujourd’hui, il accompagne les entreprises dans leur transition vers les architectures orientées streaming d’événements. Florian est certifié Confluent Administrator & Developer pour Apache Kafka. Il est nommé deux années consécutive (2019 et 2020) “Confluent Community Catalyst” pour ses contributions sur le projet Apache Kafka Streams et son implication dans la communauté open-source. Il fait partie des organisateurs du Paris Apache Kafka Meetup.