Skip to content
Advertisement

Transport Endpoint Not Connected – Mesos Slave / Master

I’m trying to connect a Mesos slave to its master. Whenver the slave tries to connect to the master, I get the following message:

JavaScript

The error seems to be:

E0806 16:39:59.091384 940 socket.hpp:107] Shutdown failed on fd=25: Transport endpoint is not connected [107]

The host was started using:

JavaScript

And the slave

JavaScript

If I run the slave on the same VM as the host it’s working fine.

I couldn’t find much information on the internet. I’m running two virtual boxes (Debian 8.1) on VirtualBox 5. The host is a windows 7.

Edit 1:

The master and the slave both run on a dedicated VM.

Both VMs nextorks are configured using bridged network.

ifconfig from master:

JavaScript

ifconfig from slave:

JavaScript

Edit 2:

The slave logs can be found at http://pastebin.com/CXZUBHKr

The master logs can be found at http://pastebin.com/thYR1par

Advertisement

Answer

I had a similar problem. My slave logs would be filled with

JavaScript

My master would have

JavaScript

And the master would die, and a new election would occur, the killed master would be restarted by upstart (I am on a Centos 6 box) and be added into the pool of potential masters. Thus my elected master would daisy chain around my master nodes. Many restarts of masters and slaves did nothing the problem would consistently return within 1 minute of master election.

The solution for me came from a this stackoverflow question (thanks) and a hint in a github gist note.

The gist of it is /etc/default/mesos-master must specify a quorum number (it needs to be correct for the number of mesos masters, in my case 3)

JavaScript

This seems odd to me as I have the same information in the file /etc/mesos-master/quorum

But I added it to /etc/default/mesos-master restarted the mesos-masters and slaves and the problem has not returned.

I hope this helps you.

Advertisement