Apache Kafka 4.3: Broker Cordoning, CIDR ACLs, Share Groups

Apache Kafka 4.3 broker cordoning and share groups visualization

Apache Kafka 4.3.0 dropped on May 22 with 25 KIPs and 600+ commits. The release notes are long. Most of it is maintenance — the kind of work that keeps a mature platform trustworthy. Three changes are not maintenance. Broker cordoning finally lets you decommission a disk without racing Kafka’s own partition scheduler. CIDR ACLs end the absurdity of one ACL entry per dynamic Kubernetes IP. Share groups grow up and get real tuning knobs. If your team runs Kafka in production, at least one of these changes belongs in your runbooks today.

Broker Cordoning: Stop Kafka From Making Your Bad Disk Worse

Here is the operational problem Kafka teams have dealt with for years: your disk is degrading. You want to drain it. But Kafka has no native concept of “don’t assign anything new to this directory.” It keeps routing new partition replicas there, making the evacuation a race you are always losing. The workarounds ranged from ugly (manually fencing the broker, using Confluent-specific tooling) to riskier (stopping the broker entirely during maintenance).

KIP-1066 closes this gap with one new configuration: cordoned.log.dirs. Mark a log directory as cordoned, and Kafka stops placing new partition replicas on it. The broker stays up. Existing data remains accessible. Writes and reads continue. You evacuate at your pace, then decommission cleanly.

# Dynamically cordon a failing disk — no broker restart required
kafka-configs.sh --bootstrap-server localhost:9092 \
  --entity-type brokers --entity-name 1 \
  --alter --add-config "cordoned.log.dirs=/data/disk2"

# Cordon the entire broker (mark all log dirs):
# cordoned.log.dirs=*

If you try to assign a partition to a broker where every log directory is cordoned, Kafka returns INELIGIBLE_REPLICA. Uncordon by removing the config — dynamically, no restart. This is the operational workflow Kafka teams have been requesting for years and the most practical reason to upgrade to 4.3 now if you manage disk capacity.

CIDR ACLs: One Subnet Rule Instead of 500 IP Entries

Kafka’s ACL model requires specifying a host per rule. In a static data center, that is fine. In a Kubernetes cluster where pod IPs cycle constantly, it is an unmanageable maintenance burden. Teams were either opening ACLs to * (any host) and losing precision, or maintaining hundreds of individual IP entries that went stale with every deployment.

KIP-1276 adds CIDR range support to ACL host patterns. Write one subnet entry and cover every IP in that range — present and future.

# Before: one entry per pod IP, repeated N times
kafka-acls.sh --add --allow-principal User:orders-service \
  --operation Read --topic orders --host 10.0.4.7

# After: cover the whole pod subnet
kafka-acls.sh --add --allow-principal User:orders-service \
  --operation Read --topic orders --host 10.0.4.0/24

For teams running Kafka on EKS, GKE, or AKS, this removes one of the most persistent operational pain points. Your ACL configuration stays accurate without anyone manually tracking pod IPs across deployments.

Share Groups: First Real Tuning Levers After Three Releases

Share groups (KIP-932, introduced in Kafka 4.0) are Kafka’s answer to queue semantics. Unlike consumer groups — where each partition is exclusively owned by one consumer — share groups distribute individual records across all available consumers with per-record acknowledgement and configurable retry counting. No ordering guarantees, but elastic scaling beyond partition count. Think RabbitMQ or SQS semantics, without leaving Kafka.

The problem was that 4.0 shipped share groups without enough operational controls. Teams running them in production had one set of defaults and no levers. KIP-1240 in 4.3 changes that: new broker-level and group-level configurations now cover delivery attempt limits, acknowledgement timeout behavior, and coordinator-side resource allocation per share group.

If your team has been evaluating whether to migrate work queue patterns from RabbitMQ, SQS, or Azure Service Bus to Kafka, the missing tuning surface was the last real objection. It is no longer missing.

Watch Your Logs After Upgrading

KIP-1274 begins the formal deprecation of the classic consumer rebalance protocol. Kafka 4.3 logs a warning if any of your applications are still using it. The new KIP-848 protocol has been the default since 4.0 and is measurably faster for large groups. If you see the warning, the fix is one line in your consumer configuration:

group.protocol=consumer

Also worth noting: the kafka-streams-scala module is deprecated in 4.3 and removed in Kafka 5.0 (KIP-1244). If your team uses Kafka Streams from Scala, plan the migration to the Java API now — 5.0 will not give you a second warning.

Should You Upgrade?

For teams already on Kafka 4.x: yes, upgrade. This is a standard minor version — a rolling upgrade from 4.2 takes a single pass of the cluster. No ZooKeeper concerns, no metadata format breaks. After binaries are updated, run the feature upgrade command and you are done. If broker cordoning or CIDR ACLs apply to your environment, the upgrade pays for itself in the first maintenance window.

For teams still on Kafka 3.x: 4.3 is the cleanest version of the 4.x line so far. The 4.0 migration — dropping ZooKeeper — was the hard part. You can jump straight to 4.3, which means landing on a mature version rather than an early one.

The full release announcement and upgrade documentation are on the Apache Kafka blog. Factor House has a detailed platform engineer breakdown worth reading before your upgrade window. The KIP-1066 specification covers all the edge cases for cordoning. And if share groups are new to you, Instaclustr’s share groups guide provides the necessary background before you configure KIP-1240.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.