Kafka in Indirect Communication Paradigm
Indirect communication is a widely used communication paradigm in distributed systems. It decouples distributed entities — message senders and receivers from each other. Every indirect communication technique employs some intermediary service or component to achieve the indirection between distributed entities. Among other indirect communication techniques, Message Broker provides the indirection for publish-subscribe and message queue systems. Kafka is a modern message broker system with some unconventional design choices that make it efficient and scalable in processing intensive message load.
This article provides a theoretical and technical background of message broker systems and a performance comparison of Kafka with other message broker systems like ActiveMQ, RabbitMQ, and AWS SQS.
The Architectural Elements in Distributed Systems
George Coulouris, et al. describes system models for distributed systems in their book, Distributed Systems: Concepts and Design.
The distributed systems compose of two main architectural elements: distributed entities and communication paradigms. Distributed entities represent any computation processes that communicate with each other over the network using some communication paradigms.
There are three main communication paradigms in distributed systems: interprocess communication, remote invocation, and indirect communication. Both interprocess communication and remote invocation have two aspects in common.
Firstly, the identities of distributed entities are known to each other. Secondly, the entities exist at the same time. That is, distributed entities in interprocess communication and remote invocation paradigms communicate directly with each other; hence, identities of distributed entities are known to each and should exist at the same time. But most of the modern distributed applications do not follow these aspects. The indirect communication paradigm addresses these problems allowing space and time uncoupling of distributed entities.
Space uncoupling: The distributed/communication entities do not know identities of each other. This property provides great flexibility in designing distributed systems. It allows replication of distributed entities to improve scalability and fault tolerance. Also, it allows easy upgrade and migration of system components.
Time uncoupling: The distributed/communication entities do not need to exist at the same time. This property allows distributed entities to have their own lifetime.
Indirect Communication Paradigm
There are four main techniques in indirect communication paradigm. All these techniques provide space and time uncoupling for distributed entities via some intermediary service component.
As shown in the following figure, the indirect communication techniques can be categorized into two groups based on their technical design. In state-based models, the distributed entities are decoupled via an intermediary service that manages shared states for distributed entities. In the message-based approach, distributed entities communicate with each other by publishing or sending messages to an intermediary service, which ensure reliable message delivery on relevant entities.
Distributed/communication entities in both publish-subscribe and message queue models have two primary roles in common: message sender/publisher and message receiver/subscriber. Also, both has a similar challenge of reliable message delivery. The similar requirements for a message oriented intermediary service has resulted emerge of message broker systems as a separate service component.
Message Broker Systems
The message broker systems provide a message-oriented intermediary for publish-subscribe and message queue communication models. In most cases, the same message broker supports both communication models. The simplicity and message-oriented nature in message brokers have made broad adoption in many distributed application domains, specifically in the area of enterprise applications with standard specifications and protocols for message brokers.
JMS: Java Message Service is part of Java EE platform. It provides a standard API for Java EE applications to create, send, and receive messages. Any applications that follow JMS API are deployable on Java EE compliance applications servers.
AMQP: Advanced Message Queuing Protocol mainly designed for supporting interoperability between different vendor products. It supports reliable message service in both Publish-subscribe and Message queue model with additional features such as routing, transaction, and security.
MQTT: Message Queue Telemetry Transport supports only publish-subscribe messaging model, and specifically designed for resource constrained devices such as mobile and Internet of Things applications.
STOMP: Simple/Streaming Text Oriented Messaging is a simple and light-weight text-based protocol similar to HTTP.
Most of the traditional message brokers such as ActiveMQ, RabbitMQ, and WebSpehereMQ adhere to one or more messaging protocols/specifications to improve their interoperability. But Kafka follows an unconventional design approach that more focus on efficiency and scalability of the system with basic messaging features.
Kafka
Kafka avoids the overhead of maintaining external message id for message locations. Instead, messages are addressed with its logical offset in the message log. The request from message consumers always contains the offest of the message that consumption should start.
It is a distributed message broker system. A cluster of message brokers hold partitions of message logs. A single consumer is always assigned to one partition. In a typical distributed broker setup there can be many partitions than the number of consumers to balance the load.
Kafka uses Zookeeper for the coordination of the distributed setup; brokers/consumers, message log partitions, and offsets, etc.
Comparison with ActiveMQ, RabbitMQ, and AWS SQS
The comparison takes the main measurement the message latency, the time taken for a message to get from producer to consumer in different message broker systems. The message latency is compared against message sizes and batch sizes.
Testbed for the Comparison
A simple testbed was designed and implemented for collecting the latency measurements for different test scenarios. The source code for the testbed can be found here.
To collect fair measurements, the testbed is set up on AWS cloud environment, which has greater control and flexibility in arranging resources. Each consumer, producer, and message broker components are setup on its own AWS EC2 instances, but in different availability zones. EC2 instances are created from Amazon Linux AMI with instance type t2.micro.
AWS Linux AMI ensures Network Time Protocol (NTP) are configured for all EC2 instances and clocks are in sync. This is an important factor for the message latency measurement. The following figure shows the execution environment setup for test scenarios:
Test Scenarios
Three main test cases were considered based on message size and message load. In the first test case, 250 messages in 25 batches are generated at the producer and send to the message broker. This was carried out for messages with size, 32 Bytes, 128 Bytes, 1KB, 4KB, 16KB, 64KB, and 256KB. Each message was piggyback with the timestamp when it was sent. The consumer continuously listens on the message broker and calculated the time difference (latency) when the message was received at the consumer. The latency was calculated for each message and recorded for the data analysis.
In the second test case, total 2400 messages of size 32 bytes was generated and sent to the Message broker in batches, 25, 50, 100, 200, 400, and 800. As in first test case, the latency was calculated for individual messages and recorded for the analysis.
The third test case considered CPU usage of the Message broker. The messages size of 1 KB was
continuously generated in 100 batches by the producer. The consumer continuously listened and read the messages. Meanwhile, CPU usage of each Message broker was monitored and recorded for the analysis.
Results and Analysis
Latency Over Message Size
As it depicts in the following figure, ActiveMQ, RabbitMQ and Kafka show a clear trend that latency gets increased over message size. AWS SQS does not show such a clear trend. The latency range in SQS is high compared to other systems for each message size. RabbitMQ shows a lower latency compare to Kafka.
Latency Over Batch Size
In this test scenario, 32 bytes fixed size messages were generated while increasing the batch size sending a total of 2,400 messages. (Note: AWS SQS was not considered in this test scenario as it showed clear high latency from previous test case). The following figure shows the result of the test scenario. It is clearly seen, Kafka tends to perform well compare to others on high message load.
The figure below shows the CPU usage of each Message broker system, while producers continuously generating 1KB messages in 100 batches, and consumer continuously retrieving messages from the broker. The test case carried out simultaneously on all message brokers. The result shows Kafka still uses low CPU usage while providing better message latency compare to ActiveMQ and RabbitMQ.
CPU usage of message brokers.
Summary
The message brokers provide an architecturally simple and yet scalable solution among other indirect communication approaches. The performance comparison of these message broker systems show that traditional approaches like RabbitMQ outperform modern systems like Kafka and SQS but in low message load. Kafka tends to perform well when increasing the message load. The CPU usage result shows that Kafka handles a high throughput of messages still with low CPU usage.