Spring Kafka with Avro Schema and Schema Registry Integration

Kafka spring boot example, Spring Boot Kafka producer consumer example, Kafka spring boot tutorial, Apache Kafka Spring Boot microservices example, Kafka Spring Boot interview questions, Kafka spring boot example github, Spring Kafka documentation, spring-kafka maven

Apache Kafka is widely recognized for its scalability and reliability in building real-time data pipelines and streaming applications. When it comes to data serialization in distributed systems, Apache Avro frequently stands out for its compact, schema-based storage format and its ability to support schema evolution. Integrating Apache Kafka with Avro and a Schema Registry leads to a powerful system capable of efficiently handling dynamic data schemas while ensuring interoperability.

This guide walks you through integrating Avro with Spring Kafka, from setting up the Confluent Schema Registry to defining Avro schemas and handling schema evolution gracefully. By the end, you’ll have a clear understanding and a practical working example of this integration.

Introduction to Apache Avro
Why Use Avro with Kafka?
Setup Confluent Schema Registry Locally
Add Avro and Schema Registry Dependencies
Define Avro Schema and Generate Classes
Configure Producer/Consumer with Avro Serializers
Publish and Consume Avro Messages
Schema Evolution Strategies
Handling Deserialization Exceptions
Final Working Example and GitHub Link

Introduction to Apache Avro

Apache Avro is a data serialization framework that facilitates efficient serialization and deserialization of structured data. It is schema-based, meaning every data record is serialized along with its schema, ensuring that producers and consumers can independently retrieve the schema when needed.

Key Features of Avro:

Compact and Fast: Avro data is stored in a compact binary format, making it highly performant for both storage and transfer.
Schema Evolution: Avro supports changes in schemas (e.g., adding fields), enabling backward and forward compatibility.
Language Agnosticism: Avro schemas can be processed by multiple languages such as Java, Python, and more.

Learn more in the Avro Documentation.

Why Use Avro with Kafka?

When integrating Kafka with a serialization framework, Avro provides clear advantages, especially when combined with a Schema Registry.

Benefits of Using Avro with Kafka:

Compact Messages:

- Avro formats messages as lightweight as possible, reducing network and storage costs.

Schema Enforcement:

- The schema ensures that producers and consumers follow a consistent data structure.

Dynamic Schema Updates:

- Schema Registry allows seamless updates to schemas without breaking existing consumers.

Interoperability:

- Avro’s schema-based format works across multiple programming languages and tools.

Use Cases:

Event-driven microservices where schemas evolve over time.
Domain-specific event formats in data pipelines.
Systems requiring interoperability between different applications and languages.

Setup Confluent Schema Registry Locally

The Confluent Schema Registry is a central service that stores and validates Avro schemas. It enables producers and consumers to agree on schemas without embedding them in application logic.

Step 1. Install Schema Registry via Docker:

Create a docker-compose.yml for Kafka, Zookeeper, and Schema Registry:

version: '3'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181

  kafka:
    image: confluentinc/cp-kafka
    environment:
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_BROKER_ID: 1
      KAFKA_LISTENERS: PLAINTEXT://kafka:9092
    ports:
      - "9092:9092"

  schema-registry:
    image: confluentinc/cp-schema-registry
    environment:
      SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: PLAINTEXT://kafka:9092
      SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081
    ports:
      - "8081:8081"

Start the services:

docker-compose up -d

The Schema Registry will be available at http://localhost:8081.

Step 2. Schema Registry REST API:

Verify it’s running by accessing:

curl http://localhost:8081/subjects

Add Avro and Schema Registry Dependencies

Add the following required dependencies for Avro and Schema Registry integration in the pom.xml or build.gradle file.

Maven:

<dependency>
    <groupId>io.confluent</groupId>
    <artifactId>kafka-avro-serializer</artifactId>
    <version>7.0.1</version>
</dependency>
<dependency>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro</artifactId>
    <version>1.11.1</version>
</dependency>

Gradle:

implementation 'io.confluent:kafka-avro-serializer:7.0.1'
implementation 'org.apache.avro:avro:1.11.1'

Define Avro Schema and Generate Classes

Avro schemas define the structure of your data. You can store them in .avsc files.

Sample Avro Schema (`User.avsc`):

{
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "id", "type": "string"},
    {"name": "name", "type": "string"},
    {"name": "email", "type": ["null", "string"], "default": null}
  ]
}

Generate Java Classes:

Run the Avro Maven Plugin to generate Java classes:

<plugin>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro-maven-plugin</artifactId>
    <version>1.11.1</version>
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals><goal>schema</goal></goals>
            <configuration>
                <sourceDirectory>${project.basedir}/src/main/avro</sourceDirectory>
                <outputDirectory>${project.build.directory}/generated-sources</outputDirectory>
            </configuration>
        </execution>
    </executions>
</plugin>

Configure Producer/Consumer with Avro Serializers

Update producer and consumer configurations to use Avro serializers/deserializers provided by the Confluent library.

Producer with AvroSerializer:

props.put("key.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
props.put("schema.registry.url", "http://localhost:8081");

Consumer with AvroDeserializer:

props.put("key.deserializer", "io.confluent.kafka.serializers.KafkaAvroDeserializer");
props.put("value.deserializer", "io.confluent.kafka.serializers.KafkaAvroDeserializer");
props.put("specific.avro.reader", "true");
props.put("schema.registry.url", "http://localhost:8081");

Publish and Consume Avro Messages

You can now publish and consume Avro messages using the configured serializers.

Producer Example:

User user = new User("1", "Alice", "[email protected]");
kafkaTemplate.send("users-topic", "key", user);

Consumer Example:

@KafkaListener(topics = "users-topic")
public void consume(User user) {
    System.out.println("Received User: " + user.getName());
}

Schema Evolution Strategies

To manage changes in schemas:

Add New Fields (Backward Compatibility):

- Add fields with default values.

Remove Fields (Forward Compatibility):

- Ensure consumers can handle missing fields.

Rename Fields:

- Avoid renaming as it breaks compatibility.

Learn more about schema compatibility.

Handling Deserialization Exceptions

Use a custom error handler to manage exceptions caused by schema mismatches:

@KafkaListener(topics = "users-topic", errorHandler = "kafkaErrorHandler")

Error Handler:

factory.setErrorHandler((thrownException, data) -> {
    System.err.println("Failed to process message " + data);
});

Final Working Example and GitHub Link

For the complete code, check the GitHub Repository.

This tutorial demonstrates a practical way to incorporate Avro with Spring Kafka, ensuring scalable, schema-based messaging. Start integrating Avro into your systems to achieve efficient message serialization and schema management!

FAQs:

1. What is Avro?

Avro is a schema-based serialization framework, ideal for compact and fast data storage.

2. What is Confluent Schema Registry used for?

It manages schemas for Avro-serialized messages, ensuring compatibility between producers and consumers.

3. Is Avro mandatory for Kafka?

No, but it is strongly recommended for maintaining schema consistency in distributed systems.

Table of Contents