20 Best Practices for Working With Apache Kafka at Scale

Apache Kafka is a widely popular distributed streaming platform that thousands of companies like New Relic, Uber, and Square use to build scalable, high-throughput, and reliable real-time streaming systems. For example, the production Kafka cluster at New Relic processes more than 15 million messages per second for an aggregate data rate approaching 1 Tbps.

Kafka has gained popularity with application developers and data management experts because it greatly simplifies working with data streams. But Kafka can get complex at scale. A high-throughput publish-subscribe (pub/sub) pattern with automated data retention limits doesn't do you much good if your consumers are unable to keep up with your data stream and messages disappear before they're ever seen. Likewise, you won't get much sleep if the systems hosting the data stream can't scale to meet demand or are otherwise unreliable.

In hopes of reducing that complexity, I'd like to share 20 of New Relic's best practices for operating scalable, high-throughput Kafka clusters. We've divided these tips into four categories for working with:

1. Partitions

2. Consumers

3. Producers

4. Brokers

But First, a Quick Rundown of Kafka and Its Architecture

Kafka is an efficient distributed messaging system providing built-in data redundancy and resiliency while retaining both high-throughput and scalability. It includes automatic data retention limits, making it well suited for applications that treat data as a stream, and it also supports "compacted" streams that model a map of key-value pairs.

To understand these best practices, you'll need to be familiar with some key terms:

time = messages / (consume rate per second - produce rate per second)

Best Practices for Working With Partitions

For a closer look at working with topic partitions, see Effective Strategies for Kafka Topic Partitioning.

Best Practices for Working With Consumers

Best Practices for Working With Producers

Best Practices for Working With Brokers

Additional Resources

Hopefully, these tips will get you thinking about how to use Kafka more effectively. If you're looking to increase your Kafka expertise, review the operations section of the Kafka documentation, which contains useful information about manipulating a cluster, and draws on experience from LinkedIn, where Kafka was developed. Additionally, Confluent regularly conducts and publishes online talks that can be quite helpful in learning more about Kafka.

This article was originally posted on the New Relic blog.

 

 

 

 

Top