Skip to content
Advertisement

mesos-master can not found mesos-slave, and elect a new leader in a short interval

I follow this doc to setup mesos cluster.

There are three vm(ubuntu 12, centos 6.5, centos 7.2).

JavaScript

config in each mathine:

JavaScript

After start zookeeper, mesos-master and mesos-slave in three vm, I can view the mesos webui(10.142.55.190:5050), but agents count is 0.

After a little time, mesos page get error: Failed to connect to 10.142.55.190:5050! Retrying in 16 seconds… (Now I found that zookeeper elect a new leader in a short interval)

master info log:

JavaScript

All later logs are looping

JavaScript

slave info log:

JavaScript

Advertisement

Answer

Thanks to Joseph Wu to help me solve the problem, detail:

There are two repeating log messages that tell you (indirectly) that something is wrong:

I0919 15:55:08.178272 13280 replica.cpp:673] Replica in VOTING status received a broadcasted recover request from (14)@10.142.55.202:5050

This message means that you’ve started this master before, with the same work directory. It has some sort of persistent state in its work directory.

This log message tells you that there are two masters you have not started before:

I0919 15:55:16.018023 13282 consensus.cpp:360] Aborting implicit promise request because 2 ignores received

The masters will refuse to start because there is less than a quorum of masters with the persistent state. If the masters were to start, you would have potential data loss. This is the expected behavior, as Mesos errs on the side of caution.


If I need a fresh mesos cluster, I need clean work directory of the master. But the problem is not on 10.142.55.202 as Joseph Wu says. I clear all the word_dir, and get out of this problem.

How to clean the work dir:

  1. find mesos-master work dir

    JavaScript
  2. remove it

    JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement