I have the following structure:
zookeeper: 3.4.12 kafka: kafka_2.11-1.1.0 server1: zookeeper + kafka server2: zookeeper + kafka server3: zookeeper + kafka
Created topic with replication factor 3 and partitions 3 by kafka-topics shell script.
./kafka-topics.sh --create --zookeeper localhost:2181 --topic test-flow --partitions 3 --replication-factor 3
And use group localConsumers. it works fine when leader is ok.
./kafka-topics.sh --describe --zookeeper localhost:2181 --topic test-flow Topic:test-flow PartitionCount:3 ReplicationFactor:3 Configs: Topic: test-flow Partition: 0 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic: test-flow Partition: 1 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic: test-flow Partition: 2 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
Consumers’ log
Received FindCoordinator response ClientResponse(receivedTimeMs=1529508772673, latencyMs=217, disconnected=false, requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, clientId=consumer-1, correlationId=0), responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', error=NONE, node=myserver3:9092 (id: 3 rack: null)))
But if leader is down – I get the error in consumer (systemctl stop kafka):
Node 3 is unavailable. ok
./kafka-topics.sh --describe --zookeeper localhost:2181 --topic test-flow Topic:test-flow PartitionCount:3 ReplicationFactor:3 Configs: Topic: test-flow Partition: 0 Leader: 2 Replicas: 3,2,1 Isr: 2,1 Topic: test-flow Partition: 1 Leader: 1 Replicas: 1,3,2 Isr: 1,2 Topic: test-flow Partition: 2 Leader: 2 Replicas: 2,1,3 Isr: 2,1
Consumers’ log
Received FindCoordinator response ClientResponse(receivedTimeMs=1529507314193, latencyMs=36, disconnected=false, requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, clientId=consumer-1, correlationId=149), responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) - Group coordinator lookup failed: The coordinator is not available. - Coordinator discovery failed, refreshing metadata
Consumer unable to connect until leader is down or reconnect with another consumer group.
Can’t understand why it happens? Consumer should be rebalanced to another broker, but it doesn’t.
Advertisement
Answer
Try to add properties into the server.conf and clean zookeeper cache. It should help
offsets.topic.replication.factor=3 default.replication.factor=3
Root cause of this issue is impossibility to distribute topic offsets between nodes.
Auto generated topic: __consumer_offsets
You can check it by
$ ./kafka-topics.sh --describe --zookeeper localhost:2181 --topic __consumer_offsets
Pay attention to this article: https://kafka.apache.org/documentation/#prodconfig
by default it creates __consumer_offsets with RF – 1
Important thing is to configure replication factor before the kafka/cluster start. Otherwise it can bring some issues with re configuring instances like in your case.