Concurrency node creation issue


#1

Hi,

I have in my infrastructure a topology where there’s one master node and
many slave nodes.
What I want to do is to automatically detect if a node is a master or a
slave. The rule is:

  1. If there is no master node yet, the next node will be the master
  2. If there is a master, the next node will be slave and its master will be
    the already existing master.

The problem occurs when I have an empty infrastructure (zero nodes) and I
try to create the first 2 nodes simultaneously.
When both nodes get provisioned, both check that there is no master and both
are set to be master, which is a wrong configuration.
This is a very common synchronization problem, but I don’t know how to deal
with it in the Chef environment.

Here’s the recipe to configure a node:

@@master = node

search(:node, ‘role:myrole’) do |n|
if n[‘myrole’][‘container_type’] == “master”
@@master = n
if n.name != node.name
node.set[‘myrole’][‘container_type’] = "slave"
end
end
end

if @@master == node
node.set[‘myrole’][‘container_type’] = "master"
end

template “#{node[‘myrole’][‘install_dir’]}/#{ZIP_FILE.gsub(’.zip’,
’’)}/conf/topology.xml” do
source "topology.xml.erb"
owner "root"
group "root"
mode "0644"
variables({:master => @@master})
end

How can I avoid this problem?

Thanks a lot

Daniel Cukier


#2

You have yourself a bona fide race condition.

You might consider a simpler formulation that assigns the master explicitly.

If you insist on the discovery over specificity route, then you need some
way to deal with distributed coordination, which range from hacked up happy
path implementations to paxos.

Noah or Zookeeper might be useful do you.

Someone might have examples of solutions to problems like this laying
around.

Please share whatever you decide to do.

On Mon, Sep 5, 2011 at 6:57 AM, Daniel Cukier danicuki@gmail.com wrote:

Hi,

I have in my infrastructure a topology where there’s one master node and
many slave nodes.
What I want to do is to automatically detect if a node is a master or a
slave. The rule is:

  1. If there is no master node yet, the next node will be the master
  2. If there is a master, the next node will be slave and its master will
    be the already existing master.

The problem occurs when I have an empty infrastructure (zero nodes) and I
try to create the first 2 nodes simultaneously.
When both nodes get provisioned, both check that there is no master and
both are set to be master, which is a wrong configuration.
This is a very common synchronization problem, but I don’t know how to deal
with it in the Chef environment.

Here’s the recipe to configure a node:

@@master = node

search(:node, ‘role:myrole’) do |n|
if n[‘myrole’][‘container_type’] == “master”
@@master = n
if n.name != node.name
node.set[‘myrole’][‘container_type’] = "slave"
end
end
end

if @@master == node
node.set[‘myrole’][‘container_type’] = "master"
end

template “#{node[‘myrole’][‘install_dir’]}/#{ZIP_FILE.gsub(’.zip’,
’’)}/conf/topology.xml” do
source "topology.xml.erb"
owner "root"
group "root"
mode "0644"
variables({:master => @@master})
end

How can I avoid this problem?

Thanks a lot

Daniel Cukier


#3

Here’s how I do it –

I have a role for “initial master”, which I manually assign to a single
machine. If we don’t find anyone with that set via search, and don’t hold
the role ourselves, we defer configuration until one is available.

On Sun, Sep 4, 2011 at 4:57 PM, Daniel Cukier danicuki@gmail.com wrote:

Hi,

I have in my infrastructure a topology where there’s one master node and
many slave nodes.
What I want to do is to automatically detect if a node is a master or a
slave. The rule is:

  1. If there is no master node yet, the next node will be the master
  2. If there is a master, the next node will be slave and its master will
    be the already existing master.

The problem occurs when I have an empty infrastructure (zero nodes) and I
try to create the first 2 nodes simultaneously.
When both nodes get provisioned, both check that there is no master and
both are set to be master, which is a wrong configuration.
This is a very common synchronization problem, but I don’t know how to deal
with it in the Chef environment.

Here’s the recipe to configure a node:

@@master = node

search(:node, ‘role:myrole’) do |n|
if n[‘myrole’][‘container_type’] == “master”
@@master = n
if n.name != node.name
node.set[‘myrole’][‘container_type’] = "slave"
end
end
end

if @@master == node
node.set[‘myrole’][‘container_type’] = "master"
end

template “#{node[‘myrole’][‘install_dir’]}/#{ZIP_FILE.gsub(’.zip’,
’’)}/conf/topology.xml” do
source "topology.xml.erb"
owner "root"
group "root"
mode "0644"
variables({:master => @@master})
end

How can I avoid this problem?

Thanks a lot

Daniel Cukier


#4

Hello,

As Andrew points out, this is a race condition.

What we typically do is the specificity route, where a single system who is to be master is assigned such as a role. For example the “database master” for an application will have a role … “appname_database_master.” Any other nodes that are slaves would have “appname_database_slave.” Our “database” cookbook and its companion “application” follow this pattern and behave accordingly.

However I’d love to see a solution that utilizes Noah or Zookeeper to solve the problem more dynamically.

On Sep 4, 2011, at 3:57 PM, Daniel Cukier wrote:

Hi,

I have in my infrastructure a topology where there’s one master node and
many slave nodes.
What I want to do is to automatically detect if a node is a master or a
slave. The rule is:

  1. If there is no master node yet, the next node will be the master
  2. If there is a master, the next node will be slave and its master will be
    the already existing master.

The problem occurs when I have an empty infrastructure (zero nodes) and I
try to create the first 2 nodes simultaneously.
When both nodes get provisioned, both check that there is no master and both
are set to be master, which is a wrong configuration.
This is a very common synchronization problem, but I don’t know how to deal
with it in the Chef environment.

Here’s the recipe to configure a node:

@@master = node

search(:node, ‘role:myrole’) do |n|
if n[‘myrole’][‘container_type’] == “master”
@@master = n
if n.name != node.name
node.set[‘myrole’][‘container_type’] = "slave"
end
end
end

if @@master == node
node.set[‘myrole’][‘container_type’] = "master"
end

template “#{node[‘myrole’][‘install_dir’]}/#{ZIP_FILE.gsub(’.zip’,
’’)}/conf/topology.xml” do
source "topology.xml.erb"
owner "root"
group "root"
mode "0644"
variables({:master => @@master})
end

How can I avoid this problem?

Thanks a lot

Daniel Cukier


Opscode, Inc
Joshua Timberman, Director of Training and Services
IRC, Skype, Twitter, Github: jtimberman


#5

I suppose I should record that screencast based on the demo I did in Mt.
View eh?

Since Josh and Andrew mentioned it, this is one of the primary use cases for
something like Noah (or even Zookeeper proper).

The way it works in the Noah LWRP is that you define blocking in the recipes
where appropriate.

So in the demo I used the django quickstart and defined dependencies as
’noah_block’ resources. Haproxy nodes blocked for django servers. Django
servers blocked waiting on MySQL nodes to come up. The secondary django node
blocked on the primary until it ran the initial database setup.

At the end of each node’s run, it registered itself with Noah which is what
each dependency was waiting for.

I need to move that dependency map into a databag but it does work.

I’ll hopefully have the screencast recorded this week since I need it for
something else as well.

There’s also no reason that you couldn’t simply use a custom LWRP that
blocked until something came back from search results.
On Sep 6, 2011 11:24 PM, “Joshua Timberman” joshua@opscode.com wrote:

Hello,

As Andrew points out, this is a race condition.

What we typically do is the specificity route, where a single system who
is to be master is assigned such as a role. For example the “database
master” for an application will have a role … "appname_database_master.“
Any other nodes that are slaves would have “appname_database_slave.” Our
"database” cookbook and its companion “application” follow this pattern and
behave accordingly.

However I’d love to see a solution that utilizes Noah or Zookeeper to
solve the problem more dynamically.

On Sep 4, 2011, at 3:57 PM, Daniel Cukier wrote:

Hi,

I have in my infrastructure a topology where there’s one master node and
many slave nodes.
What I want to do is to automatically detect if a node is a master or a
slave. The rule is:

  1. If there is no master node yet, the next node will be the master
  2. If there is a master, the next node will be slave and its master will
    be

the already existing master.

The problem occurs when I have an empty infrastructure (zero nodes) and I
try to create the first 2 nodes simultaneously.
When both nodes get provisioned, both check that there is no master and
both

are set to be master, which is a wrong configuration.
This is a very common synchronization problem, but I don’t know how to
deal

with it in the Chef environment.

Here’s the recipe to configure a node:

@@master = node

search(:node, ‘role:myrole’) do |n|
if n[‘myrole’][‘container_type’] == “master”
@@master = n
if n.name != node.name
node.set[‘myrole’][‘container_type’] = "slave"
end
end
end

if @@master == node
node.set[‘myrole’][‘container_type’] = "master"
end

template “#{node[‘myrole’][‘install_dir’]}/#{ZIP_FILE.gsub(’.zip’,
’’)}/conf/topology.xml” do
source "topology.xml.erb"
owner "root"
group "root"
mode "0644"
variables({:master => @@master})
end

How can I avoid this problem?

Thanks a lot

Daniel Cukier


Opscode, Inc
Joshua Timberman, Director of Training and Services
IRC, Skype, Twitter, Github: jtimberman


#6

Yo,

I’ve got some secret syrpy sizzauce coordination LWRP cookbook that
can kick this master election scenario (and a few others), it has
providers for Noah and an in-house (Cloudscaling [0]) tuple space
solution similar to the Linda coordination language [1] called
Thlayli, written by Zed Shaw (not yet FOSS, TBA)

I’ve used it for hanging the distributed components of OpenStack
together for the day job, but I’m considering making the Noah portions
of it available for the cookbook contest / world domination.

Is anyone else doing any work in this field, specifically distributed
systems coordination? MPI? =)

–AJ

[0] http://cloudscaling.com/
[1] http://en.wikipedia.org/wiki/Linda_(coordination_language)

On 7 September 2011 15:23, Joshua Timberman joshua@opscode.com wrote:

Hello,

As Andrew points out, this is a race condition.

What we typically do is the specificity route, where a single system who is to be master is assigned such as a role. For example the “database master” for an application will have a role … “appname_database_master.” Any other nodes that are slaves would have “appname_database_slave.” Our “database” cookbook and its companion “application” follow this pattern and behave accordingly.

However I’d love to see a solution that utilizes Noah or Zookeeper to solve the problem more dynamically.

On Sep 4, 2011, at 3:57 PM, Daniel Cukier wrote:

Hi,

I have in my infrastructure a topology where there’s one master node and
many slave nodes.
What I want to do is to automatically detect if a node is a master or a
slave. The rule is:

  1. If there is no master node yet, the next node will be the master
  2. If there is a master, the next node will be slave and its master will be
    the already existing master.

The problem occurs when I have an empty infrastructure (zero nodes) and I
try to create the first 2 nodes simultaneously.
When both nodes get provisioned, both check that there is no master and both
are set to be master, which is a wrong configuration.
This is a very common synchronization problem, but I don’t know how to deal
with it in the Chef environment.

Here’s the recipe to configure a node:

@@master = node

search(:node, ‘role:myrole’) do |n|
if n[‘myrole’][‘container_type’] == “master”
@@master = n
if n.name != node.name
node.set[‘myrole’][‘container_type’] = "slave"
end
end
end

if @@master == node
node.set[‘myrole’][‘container_type’] = "master"
end

template “#{node[‘myrole’][‘install_dir’]}/#{ZIP_FILE.gsub(’.zip’,
’’)}/conf/topology.xml” do
source "topology.xml.erb"
owner "root"
group "root"
mode "0644"
variables({:master => @@master})
end

How can I avoid this problem?

Thanks a lot

Daniel Cukier


Opscode, Inc
Joshua Timberman, Director of Training and Services
IRC, Skype, Twitter, Github: jtimberman


#7

That would be awesome :wink:

And AJ brings up a pretty good point, the engine that stores the
information is fairly irrelevant. The trick is simply some logic in
the cookbook that holds up the run until some external source says
"Hey, this node over here says it’s done." You could block by
continually polling redis, noah, thlayli, mysql or whatever until some
record is there.

On Tue, Sep 6, 2011 at 11:36 PM, AJ Christensen aj@junglist.gen.nz wrote:

Yo,

I’ve got some secret syrpy sizzauce coordination LWRP cookbook that
can kick this master election scenario (and a few others), it has
providers for Noah and an in-house (Cloudscaling [0]) tuple space
solution similar to the Linda coordination language [1] called
Thlayli, written by Zed Shaw (not yet FOSS, TBA)

I’ve used it for hanging the distributed components of OpenStack
together for the day job, but I’m considering making the Noah portions
of it available for the cookbook contest / world domination.

Is anyone else doing any work in this field, specifically distributed
systems coordination? MPI? =)

–AJ

[0] http://cloudscaling.com/
[1] http://en.wikipedia.org/wiki/Linda_(coordination_language)

On 7 September 2011 15:23, Joshua Timberman joshua@opscode.com wrote:

Hello,

As Andrew points out, this is a race condition.

What we typically do is the specificity route, where a single system who is to be master is assigned such as a role. For example the “database master” for an application will have a role … “appname_database_master.” Any other nodes that are slaves would have “appname_database_slave.” Our “database” cookbook and its companion “application” follow this pattern and behave accordingly.

However I’d love to see a solution that utilizes Noah or Zookeeper to solve the problem more dynamically.

On Sep 4, 2011, at 3:57 PM, Daniel Cukier wrote:

Hi,

I have in my infrastructure a topology where there’s one master node and
many slave nodes.
What I want to do is to automatically detect if a node is a master or a
slave. The rule is:

  1. If there is no master node yet, the next node will be the master
  2. If there is a master, the next node will be slave and its master will be
    the already existing master.

The problem occurs when I have an empty infrastructure (zero nodes) and I
try to create the first 2 nodes simultaneously.
When both nodes get provisioned, both check that there is no master and both
are set to be master, which is a wrong configuration.
This is a very common synchronization problem, but I don’t know how to deal
with it in the Chef environment.

Here’s the recipe to configure a node:

@@master = node

search(:node, ‘role:myrole’) do |n|
if n[‘myrole’][‘container_type’] == “master”
@@master = n
if n.name != node.name
node.set[‘myrole’][‘container_type’] = "slave"
end
end
end

if @@master == node
node.set[‘myrole’][‘container_type’] = "master"
end

template “#{node[‘myrole’][‘install_dir’]}/#{ZIP_FILE.gsub(’.zip’,
’’)}/conf/topology.xml” do
source "topology.xml.erb"
owner "root"
group "root"
mode "0644"
variables({:master => @@master})
end

How can I avoid this problem?

Thanks a lot

Daniel Cukier


Opscode, Inc
Joshua Timberman, Director of Training and Services
IRC, Skype, Twitter, Github: jtimberman