StreamThoughts
REMOTE

Kafka Streams

Advanced skills for developing stream processing applications

Price : 1980€ H.T. - per attendee

For more information about this training course, please free to contact :
training@streamthoughts.io

Description

During this instructor-led three-day hands-on we will learn how to use advanced Kafka Streams API and discuss best-practices for both for development and production.

Course Objectives

This course enables participants to acquire the following skills:

  • Understand how Kafka Streams DSL API and the low-level Processor API work.
  • Build stream processing topologies.
  • Create stateful streaming applications using internal state stores.
  • Optimize Kafka Streams applications for performance.

Pedagogy

50% theory, 50% practise

Who Should Attend ?

This course is designed for applications Developers, Architects, Data engineers who need to build and deploy streaming applications for enrich, transform, join and query data flowing through Apache Kafka in real-time.

Course Duration

3 Days

Course Prerequisites

Attendees should be familiar with developing in Java. Attendees should also be familiar with the core concepts of Apache Kafka.

Course Content

Module 1: Go back to basics

  • The Components of a Kafka cluster
  • Broker, Message, Topic & Partitions
  • Producers Basics
  • Consumers & Consumer Groups
  • Replication & Fault-tolerance
  • Data retention and compression
  • Understanding Zookeeper’s roles
  • Understanding Kafka’s performance

Module 2: Introduction to Kafka Streams

  • Why Kafka Streams ?
  • Use-Cases & Key Features
  • Levels of abstraction
  • Concepts of Streams and Table
  • Building a simple topology using DSL API
  • The underlying Processors API
  • Sub-topologies and Repartitioning topics
  • Basic configurations

Module 3: Architecture and Threading Model

  • Streams Tasks
  • Streams Threads
  • Runtime States
  • Consumption and processing model
  • Partitions assignment

Module 4: The Processor API

  • API Overview
  • Accessing Processor Context
  • Forwarding records to the downstream processors
  • Building a topology
  • Manipulating local states
  • Punctuate operations (Punctuator API)
  • Repartitioning operations

Module 5: The Streams DSL API and Stateful operations

  • The API Overview
  • The API Abstractions & Operations
  • The stateless operations
  • The aggregate functions
  • How to combine DSL and Processors (Transformer)
  • Streams DSL vs Processor API

Module 6: The windowed operations and the join semantics

  • Notions of time
  • The Window operations
  • Manipulating Time (TimestampExtractor)
  • How to manage out-of-order records
  • How to suppress intermediate aggregate results
  • The joins - semantics and operations

Module 7: Understanding state store mechanisms

  • Implementations and role of state store
  • Understanding the read path for a KeyValueStore
  • Understanding the write path for a KeyValueStore
  • Standby replicas and state Recovery

Module 8: RocksDB

  • Introduction to RocksDB
  • Concepts and architecture
  • Memory management
  • Metrics and available configurations

Module 9: Error Management

  • The types of errors
  • How to manage deserialization exceptions
  • How to manage message production exceptions
  • Which error handling strategies to choose?

Module 10: Testing a Kafka Streams Applications

  • How to test a topology using the TestTopologyDriver API
  • How to write integration tests using Embedded Kafka and Dockers

Module 11: More Advanced Concepts and Features

  • The Exactly-Once semantic
  • The topology optimization
  • How to name topology operations
  • Data reprocessing
  • How to query state stores using interactive queries

Module 12: Managing a Kafka Streams application in production

Module 13: Infrastructure and Capacity Planning recommendations

Module 14: Conclusion

The Author's Avatar
Instructor

Florian travaille depuis plus de 8 ans dans le conseil, il est co-fondateur et CEO de StreamThoughts. Au cours de sa carrière, il a travaillé sur divers projets impliquant la mise en oeuvre de plateformes d’intégration et de traitement de la data à travers les mondes technologiques de Hadoop et de Spark. Passionné par les systèmes distribués, il se spécialise dans les technologies d’event-streaming comme Apache Kafka, Apache Pulsar. Aujourd’hui, il accompagne les entreprises dans leur transition vers les architectures orientées streaming d’événements. Florian est certifié Confluent Administrator & Developer pour Apache Kafka. Il est nommé deux années consécutive (2019 et 2020) “Confluent Community Catalyst” pour ses contributions sur le projet Apache Kafka Streams et son implication dans la communauté open-source. Il fait partie des organisateurs du Paris Apache Kafka Meetup.