Scaling erchef horizontally


#1

Hi Guys,
I’m having a bit of a problem trying to scale erchef between several nodes

First, let me give you guys an overview of my environment
-2 (there will be more) servers behind a load balancer, running the
following services:
-bookshelf
-chef-expander
-chef-server-webui
-erchef
-nginx

-2 servers behind a load balancer, runing these services:
-chef-solr
-rabbitmq

-a Postgresql cluster (using pgpool) for the chefdb

Now, the problem

I can’t seem to have erchef listening on port 8000 on both servers at the
same time. When erchef starts on one of the servers, it starts crashing on
the other one

=CRASH REPORT==== 24-Apr-2014::12:35:15 ===
crasher:
initial call: sqerl_client:init/1
pid: <0.131.0>
registered_name: []
exception exit: {stop,timeout}
in function gen_server:init_it/6 (gen_server.erl, line 320)
ancestors: [<0.112.0>,pooler_pool_sup,pooler_sup,sqerl_sup,<0.107.0>]
messages: []
links: [<0.112.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 4181
stack_size: 24
reductions: 22425
neighbours:

=SUPERVISOR REPORT==== 24-Apr-2014::12:35:15 ===
Supervisor: {<0.112.0>,pooler_pooled_worker_sup}
Context: child_terminated
Reason: {stop,timeout}
Offender: [{pid,<0.131.0>},
{name,sqerl_client},
{mfargs,{sqerl_client,start_link,undefined}},
{restart_type,temporary},
{shutdown,brutal_kill},
{child_type,worker}]

-If I stop erchef on node 1, the crash reports stop, and erchef starts
listening on node2:8000
-Then, If I try to start erchef on node1, It won’t work, unless I stop it
on node2

Is there a way to avoid this, in order to be able to scale as many erchef
instances as needed?

Thanks in advance!

Dario Nievas (Snowie)
MercadoLibre Cloud Services
Arias 3751, Piso 7 (C1430CRG)
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 11-6370-6406
Tel : +54(11) 4640-8443


#2

Dario,
Would be useful to have a gist of your erlchef config, I will blame erlang
cookie but not sure, also trying to start erlchef in live mode will give
you more info about the failure.


Jorge Espada

On Thu, Apr 24, 2014 at 1:42 PM, Darío Ezequiel Nievas <
dario.nievas@mercadolibre.com> wrote:

Hi Guys,
I’m having a bit of a problem trying to scale erchef between several nodes

First, let me give you guys an overview of my environment
-2 (there will be more) servers behind a load balancer, running the
following services:
-bookshelf
-chef-expander
-chef-server-webui
-erchef
-nginx

-2 servers behind a load balancer, runing these services:
-chef-solr
-rabbitmq

-a Postgresql cluster (using pgpool) for the chefdb

Now, the problem

I can’t seem to have erchef listening on port 8000 on both servers at the
same time. When erchef starts on one of the servers, it starts crashing on
the other one

=CRASH REPORT==== 24-Apr-2014::12:35:15 ===
crasher:
initial call: sqerl_client:init/1
pid: <0.131.0>
registered_name: []
exception exit: {stop,timeout}
in function gen_server:init_it/6 (gen_server.erl, line 320)
ancestors: [<0.112.0>,pooler_pool_sup,pooler_sup,sqerl_sup,<0.107.0>]
messages: []
links: [<0.112.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 4181
stack_size: 24
reductions: 22425
neighbours:

=SUPERVISOR REPORT==== 24-Apr-2014::12:35:15 ===
Supervisor: {<0.112.0>,pooler_pooled_worker_sup}
Context: child_terminated
Reason: {stop,timeout}
Offender: [{pid,<0.131.0>},
{name,sqerl_client},
{mfargs,{sqerl_client,start_link,undefined}},
{restart_type,temporary},
{shutdown,brutal_kill},
{child_type,worker}]

-If I stop erchef on node 1, the crash reports stop, and erchef starts
listening on node2:8000
-Then, If I try to start erchef on node1, It won’t work, unless I stop it
on node2

Is there a way to avoid this, in order to be able to scale as many erchef
instances as needed?

Thanks in advance!

Dario Nievas (Snowie)
MercadoLibre Cloud Services
Arias 3751, Piso 7 (C1430CRG)
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 11-6370-6406
Tel : +54(11) 4640-8443


#3

There should be some more crash logs from the console telling you what’s
going on with erchef, but you’re also going to have some other issues with
the setup you’ve described. If you’re running enough erchef servers, you
might want to check that you’re not exceeding the available connections of
the PostgreSQL server.

Multiple Bookshelfs:
Bookshelf was not designed to be run on multiple nodes. It has local
disk-based storage for the contents of your cookbooks.

Multiple Chef Expanders / RabbitMQ / Solr:
You also don’t want to run multiple search stacks. When indexable objects
are stored on the chef server, their contents are shuffled off to a
RabbitMQ queue for which there is a chef-expander listener that’s ready to
consume that data, “expand” it, and send it to Solr for indexing. First, if
you have multiple expanders as consumers to the rabbit queue, you’re
introducing the chance that the data is indexed out-of-order. This problem
is exacerbated when you start to add multiple RabbitMQs (which erchef talk
to which queues) and multiple Solrs (which erchefs and expanders talk to
which Solr).

On Thu, Apr 24, 2014 at 9:42 AM, Darío Ezequiel Nievas <
dario.nievas@mercadolibre.com> wrote:

Hi Guys,
I’m having a bit of a problem trying to scale erchef between several nodes

First, let me give you guys an overview of my environment
-2 (there will be more) servers behind a load balancer, running the
following services:
-bookshelf
-chef-expander
-chef-server-webui
-erchef
-nginx

-2 servers behind a load balancer, runing these services:
-chef-solr
-rabbitmq

-a Postgresql cluster (using pgpool) for the chefdb

Now, the problem

I can’t seem to have erchef listening on port 8000 on both servers at the
same time. When erchef starts on one of the servers, it starts crashing on
the other one

=CRASH REPORT==== 24-Apr-2014::12:35:15 ===
crasher:
initial call: sqerl_client:init/1
pid: <0.131.0>
registered_name: []
exception exit: {stop,timeout}
in function gen_server:init_it/6 (gen_server.erl, line 320)
ancestors: [<0.112.0>,pooler_pool_sup,pooler_sup,sqerl_sup,<0.107.0>]
messages: []
links: [<0.112.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 4181
stack_size: 24
reductions: 22425
neighbours:

=SUPERVISOR REPORT==== 24-Apr-2014::12:35:15 ===
Supervisor: {<0.112.0>,pooler_pooled_worker_sup}
Context: child_terminated
Reason: {stop,timeout}
Offender: [{pid,<0.131.0>},
{name,sqerl_client},
{mfargs,{sqerl_client,start_link,undefined}},
{restart_type,temporary},
{shutdown,brutal_kill},
{child_type,worker}]

-If I stop erchef on node 1, the crash reports stop, and erchef starts
listening on node2:8000
-Then, If I try to start erchef on node1, It won’t work, unless I stop it
on node2

Is there a way to avoid this, in order to be able to scale as many erchef
instances as needed?

Thanks in advance!

Dario Nievas (Snowie)
MercadoLibre Cloud Services
Arias 3751, Piso 7 (C1430CRG)
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 11-6370-6406
Tel : +54(11) 4640-8443


Stephen Delano
Software Development Engineer
Opscode, Inc.
1008 Western Avenue
Suite 601
Seattle, WA 98104


#4

On Thursday, April 24, 2014 at 1:04 PM, Stephen Delano wrote:

There should be some more crash logs from the console telling you what’s going on with erchef, but you’re also going to have some other issues with the setup you’ve described. If you’re running enough erchef servers, you might want to check that you’re not exceeding the available connections of the PostgreSQL server.

Multiple Bookshelfs:
Bookshelf was not designed to be run on multiple nodes. It has local disk-based storage for the contents of your cookbooks.

Multiple Chef Expanders / RabbitMQ / Solr:
You also don’t want to run multiple search stacks. When indexable objects are stored on the chef server, their contents are shuffled off to a RabbitMQ queue for which there is a chef-expander listener that’s ready to consume that data, “expand” it, and send it to Solr for indexing. First, if you have multiple expanders as consumers to the rabbit queue, you’re introducing the chance that the data is indexed out-of-order. This problem is exacerbated when you start to add multiple RabbitMQs (which erchef talk to which queues) and multiple Solrs (which erchefs and expanders talk to which Solr).

On Thu, Apr 24, 2014 at 9:42 AM, Darío Ezequiel Nievas <dario.nievas@mercadolibre.com (mailto:dario.nievas@mercadolibre.com)> wrote:

Hi Guys,
I’m having a bit of a problem trying to scale erchef between several nodes

First, let me give you guys an overview of my environment
-2 (there will be more) servers behind a load balancer, running the following services:
-bookshelf
-chef-expander
-chef-server-webui
-erchef
-nginx

-2 servers behind a load balancer, runing these services:
-chef-solr
-rabbitmq

-a Postgresql cluster (using pgpool) for the chefdb
Take a look at the docs for scaling enterprise chef: http://docs.opscode.com/server_deploy_fe.html Enterprise Chef has some extra services that support the paid features, but aside from that, should give you a good idea of how to design your cluster.

HTH,


Daniel DeLeo


#5

We used this approach and happy with the results:

http://chef-docs.readthedocs.org/en/latest/private_chef_1x_install_tiered.html

It’s not quite as desirable as the high availability approach, but it has
made it easier to scale horizontally, and currently fits our needs. Our
frontend workers just run erchef while the backend runs the rest of the
suite.

-Dennis

On Thu, Apr 24, 2014 at 1:14 PM, Daniel DeLeo dan@kallistec.com wrote:

On Thursday, April 24, 2014 at 1:04 PM, Stephen Delano wrote:

There should be some more crash logs from the console telling you what’s
going on with erchef, but you’re also going to have some other issues with
the setup you’ve described. If you’re running enough erchef servers, you
might want to check that you’re not exceeding the available connections of
the PostgreSQL server.

Multiple Bookshelfs:
Bookshelf was not designed to be run on multiple nodes. It has local
disk-based storage for the contents of your cookbooks.

Multiple Chef Expanders / RabbitMQ / Solr:
You also don’t want to run multiple search stacks. When indexable
objects are stored on the chef server, their contents are shuffled off to a
RabbitMQ queue for which there is a chef-expander listener that’s ready to
consume that data, “expand” it, and send it to Solr for indexing. First, if
you have multiple expanders as consumers to the rabbit queue, you’re
introducing the chance that the data is indexed out-of-order. This problem
is exacerbated when you start to add multiple RabbitMQs (which erchef talk
to which queues) and multiple Solrs (which erchefs and expanders talk to
which Solr).

On Thu, Apr 24, 2014 at 9:42 AM, Darío Ezequiel Nievas <
dario.nievas@mercadolibre.com (mailto:dario.nievas@mercadolibre.com)>
wrote:

Hi Guys,
I’m having a bit of a problem trying to scale erchef between several
nodes

First, let me give you guys an overview of my environment
-2 (there will be more) servers behind a load balancer, running the
following services:

-bookshelf
-chef-expander
-chef-server-webui
-erchef
-nginx

-2 servers behind a load balancer, runing these services:
-chef-solr
-rabbitmq

-a Postgresql cluster (using pgpool) for the chefdb
Take a look at the docs for scaling enterprise chef:
http://docs.opscode.com/server_deploy_fe.html Enterprise Chef has some
extra services that support the paid features, but aside from that, should
give you a good idea of how to design your cluster.

HTH,


Daniel DeLeo


#6

FWIW, the link that Dan added to this thread is the current documentation
(for Enterprise Chef) and the link that Dennis added is the old
documentation (for what was previously called Private Chef) … both
describe roughly the same approach, but it should be noted that one cannot
actually follow the steps/requirements for Private Chef with Enterprise
Chef.

Please use this link to find all the current docs:
http://docs.opscode.com/enterprise/#the-server

James

On Thu, Apr 24, 2014 at 1:38 PM, Dennis Lovely dl@aegisco.com wrote:

We used this approach and happy with the results:

http://chef-docs.readthedocs.org/en/latest/private_chef_1x_install_tiered.html

It’s not quite as desirable as the high availability approach, but it has
made it easier to scale horizontally, and currently fits our needs. Our
frontend workers just run erchef while the backend runs the rest of the
suite.

-Dennis

On Thu, Apr 24, 2014 at 1:14 PM, Daniel DeLeo dan@kallistec.com wrote:

On Thursday, April 24, 2014 at 1:04 PM, Stephen Delano wrote:

There should be some more crash logs from the console telling you
what’s going on with erchef, but you’re also going to have some other
issues with the setup you’ve described. If you’re running enough erchef
servers, you might want to check that you’re not exceeding the available
connections of the PostgreSQL server.

Multiple Bookshelfs:
Bookshelf was not designed to be run on multiple nodes. It has local
disk-based storage for the contents of your cookbooks.

Multiple Chef Expanders / RabbitMQ / Solr:
You also don’t want to run multiple search stacks. When indexable
objects are stored on the chef server, their contents are shuffled off to a
RabbitMQ queue for which there is a chef-expander listener that’s ready to
consume that data, “expand” it, and send it to Solr for indexing. First, if
you have multiple expanders as consumers to the rabbit queue, you’re
introducing the chance that the data is indexed out-of-order. This problem
is exacerbated when you start to add multiple RabbitMQs (which erchef talk
to which queues) and multiple Solrs (which erchefs and expanders talk to
which Solr).

On Thu, Apr 24, 2014 at 9:42 AM, Darío Ezequiel Nievas <
dario.nievas@mercadolibre.com (mailto:dario.nievas@mercadolibre.com)>
wrote:

Hi Guys,
I’m having a bit of a problem trying to scale erchef between several
nodes

First, let me give you guys an overview of my environment
-2 (there will be more) servers behind a load balancer, running the
following services:

-bookshelf
-chef-expander
-chef-server-webui
-erchef
-nginx

-2 servers behind a load balancer, runing these services:
-chef-solr
-rabbitmq

-a Postgresql cluster (using pgpool) for the chefdb
Take a look at the docs for scaling enterprise chef:
http://docs.opscode.com/server_deploy_fe.html Enterprise Chef has some
extra services that support the paid features, but aside from that, should
give you a good idea of how to design your cluster.

HTH,


Daniel DeLeo


#7

I am currently running a Chef 10 cluster in EC2 with the bookshelf stored
on a Gluster volume mounted on all the servers, but only one node at a time
is receiving traffic for rabbitmq, solr, or couchdb. I’ve been holding off
on upgrading to Chef 11 server because I haven’t had time to figure out HA,
so I would be very interested in seeing your cluster configuration for Chef
11.

From what I understand, it is possible to run rabbitmq in an HA mode where
the queues are mirrored to slaves, but during failure there can be some
consistency loss.
http://www.rabbitmq.com/ha.html

I think solr can be made HA as well, but it looks complicated:
https://cwiki.apache.org/confluence/display/solr/SolrCloud

When I last checked into pgpool, I got scared off by the SQL restrictions,
but if it works and is stable that would be awesome!
http://www.pgpool.net/docs/latest/pgpool-en.html#restriction

On Thu, Apr 24, 2014 at 5:42 PM, James Scott jamescott@opscode.com wrote:

FWIW, the link that Dan added to this thread is the current documentation
(for Enterprise Chef) and the link that Dennis added is the old
documentation (for what was previously called Private Chef) … both
describe roughly the same approach, but it should be noted that one cannot
actually follow the steps/requirements for Private Chef with Enterprise
Chef.

Please use this link to find all the current docs:
http://docs.opscode.com/enterprise/#the-server

James

On Thu, Apr 24, 2014 at 1:38 PM, Dennis Lovely dl@aegisco.com wrote:

We used this approach and happy with the results:

http://chef-docs.readthedocs.org/en/latest/private_chef_1x_install_tiered.html

It’s not quite as desirable as the high availability approach, but it has
made it easier to scale horizontally, and currently fits our needs. Our
frontend workers just run erchef while the backend runs the rest of the
suite.

-Dennis

On Thu, Apr 24, 2014 at 1:14 PM, Daniel DeLeo dan@kallistec.com wrote:

On Thursday, April 24, 2014 at 1:04 PM, Stephen Delano wrote:

There should be some more crash logs from the console telling you
what’s going on with erchef, but you’re also going to have some other
issues with the setup you’ve described. If you’re running enough erchef
servers, you might want to check that you’re not exceeding the available
connections of the PostgreSQL server.

Multiple Bookshelfs:
Bookshelf was not designed to be run on multiple nodes. It has local
disk-based storage for the contents of your cookbooks.

Multiple Chef Expanders / RabbitMQ / Solr:
You also don’t want to run multiple search stacks. When indexable
objects are stored on the chef server, their contents are shuffled off to a
RabbitMQ queue for which there is a chef-expander listener that’s ready to
consume that data, “expand” it, and send it to Solr for indexing. First, if
you have multiple expanders as consumers to the rabbit queue, you’re
introducing the chance that the data is indexed out-of-order. This problem
is exacerbated when you start to add multiple RabbitMQs (which erchef talk
to which queues) and multiple Solrs (which erchefs and expanders talk to
which Solr).

On Thu, Apr 24, 2014 at 9:42 AM, Darío Ezequiel Nievas <
dario.nievas@mercadolibre.com (mailto:dario.nievas@mercadolibre.com)>
wrote:

Hi Guys,
I’m having a bit of a problem trying to scale erchef between several
nodes

First, let me give you guys an overview of my environment
-2 (there will be more) servers behind a load balancer, running the
following services:

-bookshelf
-chef-expander
-chef-server-webui
-erchef
-nginx

-2 servers behind a load balancer, runing these services:
-chef-solr
-rabbitmq

-a Postgresql cluster (using pgpool) for the chefdb
Take a look at the docs for scaling enterprise chef:
http://docs.opscode.com/server_deploy_fe.html Enterprise Chef has some
extra services that support the paid features, but aside from that, should
give you a good idea of how to design your cluster.

HTH,


Daniel DeLeo


Cameron Cope | Systems Engineer
Brightcove, Inc
290 Congress Street, 4th Floor, Boston, MA 02110
ccope@brightcove.com


#8

BTW, in case it wasn’t mentioned before: Since Bookshelf speaks the S3
protocol, it is possible to have erchef use S3 directly and turn off
Bookshelf entirely.

  • Julian

On Fri, Apr 25, 2014 at 2:59 AM, Cameron Cope ccope@brightcove.com wrote:

I am currently running a Chef 10 cluster in EC2 with the bookshelf stored on
a Gluster volume mounted on all the servers, but only one node at a time is
receiving traffic for rabbitmq, solr, or couchdb. I’ve been holding off on
upgrading to Chef 11 server because I haven’t had time to figure out HA, so
I would be very interested in seeing your cluster configuration for Chef 11.

From what I understand, it is possible to run rabbitmq in an HA mode where
the queues are mirrored to slaves, but during failure there can be some
consistency loss.
http://www.rabbitmq.com/ha.html

I think solr can be made HA as well, but it looks complicated:
https://cwiki.apache.org/confluence/display/solr/SolrCloud

When I last checked into pgpool, I got scared off by the SQL restrictions,
but if it works and is stable that would be awesome!
http://www.pgpool.net/docs/latest/pgpool-en.html#restriction


[ Julian C. Dunn jdunn@aquezada.com * Sorry, I’m ]
[ WWW: http://www.aquezada.com/staff/julian * only Web 1.0 ]
[ gopher://sdf.org/1/users/keymaker/ * compliant! ]
[ PGP: 91B3 7A9D 683C 7C16 715F 442C 6065 D533 FDC2 05B9 ]


#9

julian,
is there any documentation on this, i.e. how to use s3 as cookbook store?
what are the config changes required etc?
regards
ranjib

On Fri, Apr 25, 2014 at 7:05 AM, Julian C. Dunn jdunn@aquezada.com wrote:

BTW, in case it wasn’t mentioned before: Since Bookshelf speaks the S3
protocol, it is possible to have erchef use S3 directly and turn off
Bookshelf entirely.

  • Julian

On Fri, Apr 25, 2014 at 2:59 AM, Cameron Cope ccope@brightcove.com
wrote:

I am currently running a Chef 10 cluster in EC2 with the bookshelf
stored on
a Gluster volume mounted on all the servers, but only one node at a time
is
receiving traffic for rabbitmq, solr, or couchdb. I’ve been holding off
on
upgrading to Chef 11 server because I haven’t had time to figure out HA,
so
I would be very interested in seeing your cluster configuration for Chef

From what I understand, it is possible to run rabbitmq in an HA mode
where
the queues are mirrored to slaves, but during failure there can be some
consistency loss.
http://www.rabbitmq.com/ha.html

I think solr can be made HA as well, but it looks complicated:
https://cwiki.apache.org/confluence/display/solr/SolrCloud

When I last checked into pgpool, I got scared off by the SQL
restrictions,
but if it works and is stable that would be awesome!
http://www.pgpool.net/docs/latest/pgpool-en.html#restriction


[ Julian C. Dunn jdunn@aquezada.com * Sorry, I’m ]
[ WWW: http://www.aquezada.com/staff/julian * only Web 1.0 ]
[ gopher://sdf.org/1/users/keymaker/ * compliant! ]
[ PGP: 91B3 7A9D 683C 7C16 715F 442C 6065 D533 FDC2 05B9 ]


#10

We haven’t documented it because it’s not officially supported, but
basically the directions are:

  1. Make a bucket and an IAM user
  2. Grant “s3:PutObject”, “s3:GetObject”, “s3:DeleteObject”, and
    "s3:ListBucket" for said bucket to the IAM user
  3. Modify config settings in /etc/opscode/private-chef.rb (or the
    corresponding variant for OSS Chef Server)
    a) bookshelf[‘vip’] = ‘s3.amazonaws.com’ # or whatever
    region-specific endpoint you want
    b) bookshelf[‘access_key_id’] = 'your IAM user access key’
    c) bookshelf[‘secret_access_key’] = 'your IAM user secret key’
    d) opscode_erchef[‘s3_bucket’] = ‘bucket_name’
  4. private-chef-ctl reconfigure
  5. optionally disable bookshelf locally as you won’t need it.

regards,
Julian

On Fri, Apr 25, 2014 at 11:59 AM, Ranjib Dey dey.ranjib@gmail.com wrote:

julian,
is there any documentation on this, i.e. how to use s3 as cookbook store?
what are the config changes required etc?
regards
ranjib

On Fri, Apr 25, 2014 at 7:05 AM, Julian C. Dunn jdunn@aquezada.com wrote:

BTW, in case it wasn’t mentioned before: Since Bookshelf speaks the S3
protocol, it is possible to have erchef use S3 directly and turn off
Bookshelf entirely.

  • Julian

On Fri, Apr 25, 2014 at 2:59 AM, Cameron Cope ccope@brightcove.com
wrote:

I am currently running a Chef 10 cluster in EC2 with the bookshelf
stored on
a Gluster volume mounted on all the servers, but only one node at a time
is
receiving traffic for rabbitmq, solr, or couchdb. I’ve been holding off
on
upgrading to Chef 11 server because I haven’t had time to figure out HA,
so
I would be very interested in seeing your cluster configuration for Chef
11.

From what I understand, it is possible to run rabbitmq in an HA mode
where
the queues are mirrored to slaves, but during failure there can be some
consistency loss.
http://www.rabbitmq.com/ha.html

I think solr can be made HA as well, but it looks complicated:
https://cwiki.apache.org/confluence/display/solr/SolrCloud

When I last checked into pgpool, I got scared off by the SQL
restrictions,
but if it works and is stable that would be awesome!
http://www.pgpool.net/docs/latest/pgpool-en.html#restriction


[ Julian C. Dunn jdunn@aquezada.com * Sorry, I’m ]
[ WWW: http://www.aquezada.com/staff/julian * only Web 1.0 ]
[ gopher://sdf.org/1/users/keymaker/ * compliant! ]
[ PGP: 91B3 7A9D 683C 7C16 715F 442C 6065 D533 FDC2 05B9 ]


[ Julian C. Dunn jdunn@aquezada.com * Sorry, I’m ]
[ WWW: http://www.aquezada.com/staff/julian * only Web 1.0 ]
[ gopher://sdf.org/1/users/keymaker/ * compliant! ]
[ PGP: 91B3 7A9D 683C 7C16 715F 442C 6065 D533 FDC2 05B9 ]


#11

This is great!

I am going to try this against my Eucalyptus install as well.

On Fri, Apr 25, 2014 at 9:29 AM, Julian C. Dunn jdunn@aquezada.com wrote:

We haven’t documented it because it’s not officially supported, but
basically the directions are:

  1. Make a bucket and an IAM user
  2. Grant “s3:PutObject”, “s3:GetObject”, “s3:DeleteObject”, and
    "s3:ListBucket" for said bucket to the IAM user
  3. Modify config settings in /etc/opscode/private-chef.rb (or the
    corresponding variant for OSS Chef Server)
    a) bookshelf[‘vip’] = ‘s3.amazonaws.com’ # or whatever
    region-specific endpoint you want
    b) bookshelf[‘access_key_id’] = 'your IAM user access key’
    c) bookshelf[‘secret_access_key’] = 'your IAM user secret key’
    d) opscode_erchef[‘s3_bucket’] = ‘bucket_name’
  4. private-chef-ctl reconfigure
  5. optionally disable bookshelf locally as you won’t need it.

regards,
Julian

On Fri, Apr 25, 2014 at 11:59 AM, Ranjib Dey dey.ranjib@gmail.com wrote:

julian,
is there any documentation on this, i.e. how to use s3 as cookbook store?
what are the config changes required etc?
regards
ranjib

On Fri, Apr 25, 2014 at 7:05 AM, Julian C. Dunn jdunn@aquezada.com
wrote:

BTW, in case it wasn’t mentioned before: Since Bookshelf speaks the S3
protocol, it is possible to have erchef use S3 directly and turn off
Bookshelf entirely.

  • Julian

On Fri, Apr 25, 2014 at 2:59 AM, Cameron Cope ccope@brightcove.com
wrote:

I am currently running a Chef 10 cluster in EC2 with the bookshelf
stored on
a Gluster volume mounted on all the servers, but only one node at a
time

is
receiving traffic for rabbitmq, solr, or couchdb. I’ve been holding
off

on
upgrading to Chef 11 server because I haven’t had time to figure out
HA,

so
I would be very interested in seeing your cluster configuration for
Chef

From what I understand, it is possible to run rabbitmq in an HA mode
where
the queues are mirrored to slaves, but during failure there can be
some

consistency loss.
http://www.rabbitmq.com/ha.html

I think solr can be made HA as well, but it looks complicated:
https://cwiki.apache.org/confluence/display/solr/SolrCloud

When I last checked into pgpool, I got scared off by the SQL
restrictions,
but if it works and is stable that would be awesome!
http://www.pgpool.net/docs/latest/pgpool-en.html#restriction


[ Julian C. Dunn jdunn@aquezada.com * Sorry, I’m ]
[ WWW: http://www.aquezada.com/staff/julian * only Web 1.0 ]
[ gopher://sdf.org/1/users/keymaker/ * compliant! ]
[ PGP: 91B3 7A9D 683C 7C16 715F 442C 6065 D533 FDC2 05B9 ]


[ Julian C. Dunn jdunn@aquezada.com * Sorry, I’m ]
[ WWW: http://www.aquezada.com/staff/julian * only Web 1.0 ]
[ gopher://sdf.org/1/users/keymaker/ * compliant! ]
[ PGP: 91B3 7A9D 683C 7C16 715F 442C 6065 D533 FDC2 05B9 ]


#12

this is awesome!

On Fri, Apr 25, 2014 at 9:29 AM, Julian C. Dunn jdunn@aquezada.com wrote:

We haven’t documented it because it’s not officially supported, but
basically the directions are:

  1. Make a bucket and an IAM user
  2. Grant “s3:PutObject”, “s3:GetObject”, “s3:DeleteObject”, and
    "s3:ListBucket" for said bucket to the IAM user
  3. Modify config settings in /etc/opscode/private-chef.rb (or the
    corresponding variant for OSS Chef Server)
    a) bookshelf[‘vip’] = ‘s3.amazonaws.com’ # or whatever
    region-specific endpoint you want
    b) bookshelf[‘access_key_id’] = 'your IAM user access key’
    c) bookshelf[‘secret_access_key’] = 'your IAM user secret key’
    d) opscode_erchef[‘s3_bucket’] = ‘bucket_name’
  4. private-chef-ctl reconfigure
  5. optionally disable bookshelf locally as you won’t need it.

regards,
Julian

On Fri, Apr 25, 2014 at 11:59 AM, Ranjib Dey dey.ranjib@gmail.com wrote:

julian,
is there any documentation on this, i.e. how to use s3 as cookbook store?
what are the config changes required etc?
regards
ranjib

On Fri, Apr 25, 2014 at 7:05 AM, Julian C. Dunn jdunn@aquezada.com
wrote:

BTW, in case it wasn’t mentioned before: Since Bookshelf speaks the S3
protocol, it is possible to have erchef use S3 directly and turn off
Bookshelf entirely.

  • Julian

On Fri, Apr 25, 2014 at 2:59 AM, Cameron Cope ccope@brightcove.com
wrote:

I am currently running a Chef 10 cluster in EC2 with the bookshelf
stored on
a Gluster volume mounted on all the servers, but only one node at a
time

is
receiving traffic for rabbitmq, solr, or couchdb. I’ve been holding
off

on
upgrading to Chef 11 server because I haven’t had time to figure out
HA,

so
I would be very interested in seeing your cluster configuration for
Chef

From what I understand, it is possible to run rabbitmq in an HA mode
where
the queues are mirrored to slaves, but during failure there can be
some

consistency loss.
http://www.rabbitmq.com/ha.html

I think solr can be made HA as well, but it looks complicated:
https://cwiki.apache.org/confluence/display/solr/SolrCloud

When I last checked into pgpool, I got scared off by the SQL
restrictions,
but if it works and is stable that would be awesome!
http://www.pgpool.net/docs/latest/pgpool-en.html#restriction


[ Julian C. Dunn jdunn@aquezada.com * Sorry, I’m ]
[ WWW: http://www.aquezada.com/staff/julian * only Web 1.0 ]
[ gopher://sdf.org/1/users/keymaker/ * compliant! ]
[ PGP: 91B3 7A9D 683C 7C16 715F 442C 6065 D533 FDC2 05B9 ]


[ Julian C. Dunn jdunn@aquezada.com * Sorry, I’m ]
[ WWW: http://www.aquezada.com/staff/julian * only Web 1.0 ]
[ gopher://sdf.org/1/users/keymaker/ * compliant! ]
[ PGP: 91B3 7A9D 683C 7C16 715F 442C 6065 D533 FDC2 05B9 ]


#13

Hmm, Stephen, when you say “indexable objects are stored on chef server”,
you mean there’s a call that comes in to the API (say, create new node)
that goes to erchef, which goes to chef-expander and rabbitmq? In that
case, one chef-expander and rabbitmq per erchef are appropriate, it seems,
as long as each erchef talks to its own chef-expander and rabbitmq.

Here’s how we’ve set up Chef 11 at my company:

The Web UI / API hosts have [chef-expander chef-server-webui erchef
rabbitmq nginx], Postgres/Bookshelf/Solr hosts are dedicated to their role.
Everything is set up with chef-server cookbook and custom roles, except
Postgres (since chef-server cookbook doesn’t allow master/slave config).
Bookshelf is replicated on filesystem level (slave is read-only until
replication is broken).

We’ve ran this for a few months and haven’t seen any issues yet.

On Thu, Apr 24, 2014 at 1:04 PM, Stephen Delano stephen@opscode.com wrote:

There should be some more crash logs from the console telling you what’s
going on with erchef, but you’re also going to have some other issues with
the setup you’ve described. If you’re running enough erchef servers, you
might want to check that you’re not exceeding the available connections of
the PostgreSQL server.

Multiple Bookshelfs:
Bookshelf was not designed to be run on multiple nodes. It has local
disk-based storage for the contents of your cookbooks.

Multiple Chef Expanders / RabbitMQ / Solr:
You also don’t want to run multiple search stacks. When indexable objects
are stored on the chef server, their contents are shuffled off to a
RabbitMQ queue for which there is a chef-expander listener that’s ready to
consume that data, “expand” it, and send it to Solr for indexing. First, if
you have multiple expanders as consumers to the rabbit queue, you’re
introducing the chance that the data is indexed out-of-order. This problem
is exacerbated when you start to add multiple RabbitMQs (which erchef talk
to which queues) and multiple Solrs (which erchefs and expanders talk to
which Solr).

On Thu, Apr 24, 2014 at 9:42 AM, Darío Ezequiel Nievas <
dario.nievas@mercadolibre.com> wrote:

Hi Guys,
I’m having a bit of a problem trying to scale erchef between several nodes

First, let me give you guys an overview of my environment
-2 (there will be more) servers behind a load balancer, running the
following services:
-bookshelf
-chef-expander
-chef-server-webui
-erchef
-nginx

-2 servers behind a load balancer, runing these services:
-chef-solr
-rabbitmq

-a Postgresql cluster (using pgpool) for the chefdb

Now, the problem

I can’t seem to have erchef listening on port 8000 on both servers at the
same time. When erchef starts on one of the servers, it starts crashing on
the other one

=CRASH REPORT==== 24-Apr-2014::12:35:15 ===
crasher:
initial call: sqerl_client:init/1
pid: <0.131.0>
registered_name: []
exception exit: {stop,timeout}
in function gen_server:init_it/6 (gen_server.erl, line 320)
ancestors: [<0.112.0>,pooler_pool_sup,pooler_sup,sqerl_sup,<0.107.0>]
messages: []
links: [<0.112.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 4181
stack_size: 24
reductions: 22425
neighbours:

=SUPERVISOR REPORT==== 24-Apr-2014::12:35:15 ===
Supervisor: {<0.112.0>,pooler_pooled_worker_sup}
Context: child_terminated
Reason: {stop,timeout}
Offender: [{pid,<0.131.0>},
{name,sqerl_client},
{mfargs,{sqerl_client,start_link,undefined}},
{restart_type,temporary},
{shutdown,brutal_kill},
{child_type,worker}]

-If I stop erchef on node 1, the crash reports stop, and erchef starts
listening on node2:8000
-Then, If I try to start erchef on node1, It won’t work, unless I stop it
on node2

Is there a way to avoid this, in order to be able to scale as many erchef
instances as needed?

Thanks in advance!

Dario Nievas (Snowie)
MercadoLibre Cloud Services
Arias 3751, Piso 7 (C1430CRG)
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 11-6370-6406
Tel : +54(11) 4640-8443


Stephen Delano
Software Development Engineer
Opscode, Inc.
1008 Western Avenue
Suite 601
Seattle, WA 98104


Best regards, Dmitriy V.


#14

Hi Dmitriy,

Let me explain the “stored on the chef server” part in a bit more detail.
When a CREATE or UPDATE request arrives at the chef server, the following
happens:

  • the data is stored in PostgreSQL
  • the data it sent to RabbitMQ to be asynchronously indexed
  • a successful response is sent to the client

Asynchronously, chef-expander is pulling objects off the RabbitMQ queue IN
THE ORDER WHICH THEY WERE SAVED. We’ve taken great care to make
chef-expander run in parallel and preserve the save order of items that
arrive on the queue. Note: parallel is not necessarily “scaled across
multiple nodes”.

If you have multiple RabbitMQ / chef-expander stacks, you introduce the
potential for the following type of data race:

t0. update request for databag_item DBI1 on erchef0
t1. update request for databag_item BDI1 on erchef1 (slightly different
data, last save “wins”)
t2. erchef0 commits DBI1 to PostgreSQL
t3. erchef1 commits DBI1 to PostgreSQL
t4. erchef0 sends BDI1 to RabbitMQ0
t5. erchef1 sends DBI1 to RabbitMQ1
t6. chef-expander1 indexes DBI1 async
t7. chef-expander0 indexes DBI1 async

Note the ordering of t6 and t7. Once things are sent off into different
async queues, there is no guarantee that they will be indexed in the order
on which they were sent to the RabbitMQ queue.

It’s also worth noting that even with a single RabbitMQ / chef-expander
stack you can still theorize a data race where the ordering of commits to
database differs from the ordering of commits to RabbitMQ. It’s worth
keeping in mind that the chances of this contention occurring in the
RabbitMQ / chef-expander side of the transaction are much greater because
of the relative speed of the indexing operations. In a large Chef
infrastructure, Solr will spend a lot of time blocking update requests to
the index while it commits. The more time an item spends on the queue, the
higher the chance there is that something can get indexed out of order if
you’re not careful.

You mentioned that you’ve been running fine with this setup for some time,
and I believe it. I’ve concocted quite a scenario above, and even conceded
that a single RabbitMQ / chef-expander is not guaranteed to prevent
out-of-order indexing. Historically, this was REAL problem with the Chef
Server. Older versions of the chef-client saved the node at the beginning
of the run without all of the attributes fully realized, and once again
saved the node at the end of the run. If the initial save of the client
data was the one that “won” the data race, in many cases you’d end up
having searches that failed because of attributes that you depended on not
being indexed. It was even worse when a chef-client run issued 3 saves, or
a recipe triggered a save.

If you find that a single RabbitMQ / chef-expander stack is not meeting
your needs for some reason, we should work together to improve the
horizontal scalability of the chef-expander.

On Tue, Apr 29, 2014 at 11:11 PM, DV vindimy@gmail.com wrote:

Hmm, Stephen, when you say “indexable objects are stored on chef server”,
you mean there’s a call that comes in to the API (say, create new node)
that goes to erchef, which goes to chef-expander and rabbitmq? In that
case, one chef-expander and rabbitmq per erchef are appropriate, it seems,
as long as each erchef talks to its own chef-expander and rabbitmq.

Here’s how we’ve set up Chef 11 at my company:
https://www.dropbox.com/s/q41172dtrth4yw5/chef11_layout.png

The Web UI / API hosts have [chef-expander chef-server-webui erchef
rabbitmq nginx], Postgres/Bookshelf/Solr hosts are dedicated to their role.
Everything is set up with chef-server cookbook and custom roles, except
Postgres (since chef-server cookbook doesn’t allow master/slave config).
Bookshelf is replicated on filesystem level (slave is read-only until
replication is broken).

We’ve ran this for a few months and haven’t seen any issues yet.

On Thu, Apr 24, 2014 at 1:04 PM, Stephen Delano stephen@opscode.comwrote:

There should be some more crash logs from the console telling you what’s
going on with erchef, but you’re also going to have some other issues with
the setup you’ve described. If you’re running enough erchef servers, you
might want to check that you’re not exceeding the available connections of
the PostgreSQL server.

Multiple Bookshelfs:
Bookshelf was not designed to be run on multiple nodes. It has local
disk-based storage for the contents of your cookbooks.

Multiple Chef Expanders / RabbitMQ / Solr:
You also don’t want to run multiple search stacks. When indexable objects
are stored on the chef server, their contents are shuffled off to a
RabbitMQ queue for which there is a chef-expander listener that’s ready to
consume that data, “expand” it, and send it to Solr for indexing. First, if
you have multiple expanders as consumers to the rabbit queue, you’re
introducing the chance that the data is indexed out-of-order. This problem
is exacerbated when you start to add multiple RabbitMQs (which erchef talk
to which queues) and multiple Solrs (which erchefs and expanders talk to
which Solr).

On Thu, Apr 24, 2014 at 9:42 AM, Darío Ezequiel Nievas <
dario.nievas@mercadolibre.com> wrote:

Hi Guys,
I’m having a bit of a problem trying to scale erchef between several
nodes

First, let me give you guys an overview of my environment
-2 (there will be more) servers behind a load balancer, running the
following services:
-bookshelf
-chef-expander
-chef-server-webui
-erchef
-nginx

-2 servers behind a load balancer, runing these services:
-chef-solr
-rabbitmq

-a Postgresql cluster (using pgpool) for the chefdb

Now, the problem

I can’t seem to have erchef listening on port 8000 on both servers at
the same time. When erchef starts on one of the servers, it starts crashing
on the other one

=CRASH REPORT==== 24-Apr-2014::12:35:15 ===
crasher:
initial call: sqerl_client:init/1
pid: <0.131.0>
registered_name: []
exception exit: {stop,timeout}
in function gen_server:init_it/6 (gen_server.erl, line 320)
ancestors: [<0.112.0>,pooler_pool_sup,pooler_sup,sqerl_sup,<0.107.0>]
messages: []
links: [<0.112.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 4181
stack_size: 24
reductions: 22425
neighbours:

=SUPERVISOR REPORT==== 24-Apr-2014::12:35:15 ===
Supervisor: {<0.112.0>,pooler_pooled_worker_sup}
Context: child_terminated
Reason: {stop,timeout}
Offender: [{pid,<0.131.0>},
{name,sqerl_client},
{mfargs,{sqerl_client,start_link,undefined}},
{restart_type,temporary},
{shutdown,brutal_kill},
{child_type,worker}]

-If I stop erchef on node 1, the crash reports stop, and erchef starts
listening on node2:8000
-Then, If I try to start erchef on node1, It won’t work, unless I stop
it on node2

Is there a way to avoid this, in order to be able to scale as many
erchef instances as needed?

Thanks in advance!

Dario Nievas (Snowie)
MercadoLibre Cloud Services
Arias 3751, Piso 7 (C1430CRG)
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 11-6370-6406
Tel : +54(11) 4640-8443


Stephen Delano
Software Development Engineer
Opscode, Inc.
1008 Western Avenue
Suite 601
Seattle, WA 98104


Best regards, Dmitriy V.


Stephen Delano
Software Development Engineer
Opscode, Inc.
1008 Western Avenue
Suite 601
Seattle, WA 98104


#15

Thanks a lot for taking the time to explain how the message routing works.
In case we do see an inconsistency in Solr index, would running
"chef-server-ctl reindex" help? If so, would running this command on a
daily or weekly basis be a “better than nothing” measure to guard against
this?

Our setup is probably not active enough for this issue to have a chance at
affecting anything (or else we haven’t noticed yet). While we have just
about everything managed with Chef, we’re at the point where we don’t see a
high volume of changes being made.

As far as why we’ve gone with multiple Web UI / API hosts - it’s a habit of
splitting critical applications across independent hardware. We could most
definitely keep just one of the Web UI / API hosts running, while the rest
shut down (this was our Chef 10 setup).

We plan on further improving HA by provisioning a set of hosts in AWS and
perhaps in a different datacenter. After all, when there’s a large outage,
Chef is one of the first thing we’d want to be available.

On Wed, Apr 30, 2014 at 7:29 AM, Stephen Delano stephen@opscode.com wrote:

Hi Dmitriy,

Let me explain the “stored on the chef server” part in a bit more detail.
When a CREATE or UPDATE request arrives at the chef server, the following
happens:

  • the data is stored in PostgreSQL
  • the data it sent to RabbitMQ to be asynchronously indexed
  • a successful response is sent to the client

Asynchronously, chef-expander is pulling objects off the RabbitMQ queue IN
THE ORDER WHICH THEY WERE SAVED. We’ve taken great care to make
chef-expander run in parallel and preserve the save order of items that
arrive on the queue. Note: parallel is not necessarily “scaled across
multiple nodes”.

If you have multiple RabbitMQ / chef-expander stacks, you introduce the
potential for the following type of data race:

t0. update request for databag_item DBI1 on erchef0
t1. update request for databag_item BDI1 on erchef1 (slightly different
data, last save “wins”)
t2. erchef0 commits DBI1 to PostgreSQL
t3. erchef1 commits DBI1 to PostgreSQL
t4. erchef0 sends BDI1 to RabbitMQ0
t5. erchef1 sends DBI1 to RabbitMQ1
t6. chef-expander1 indexes DBI1 async
t7. chef-expander0 indexes DBI1 async

Note the ordering of t6 and t7. Once things are sent off into different
async queues, there is no guarantee that they will be indexed in the order
on which they were sent to the RabbitMQ queue.

It’s also worth noting that even with a single RabbitMQ / chef-expander
stack you can still theorize a data race where the ordering of commits to
database differs from the ordering of commits to RabbitMQ. It’s worth
keeping in mind that the chances of this contention occurring in the
RabbitMQ / chef-expander side of the transaction are much greater because
of the relative speed of the indexing operations. In a large Chef
infrastructure, Solr will spend a lot of time blocking update requests to
the index while it commits. The more time an item spends on the queue, the
higher the chance there is that something can get indexed out of order if
you’re not careful.

You mentioned that you’ve been running fine with this setup for some time,
and I believe it. I’ve concocted quite a scenario above, and even conceded
that a single RabbitMQ / chef-expander is not guaranteed to prevent
out-of-order indexing. Historically, this was REAL problem with the Chef
Server. Older versions of the chef-client saved the node at the beginning
of the run without all of the attributes fully realized, and once again
saved the node at the end of the run. If the initial save of the client
data was the one that “won” the data race, in many cases you’d end up
having searches that failed because of attributes that you depended on not
being indexed. It was even worse when a chef-client run issued 3 saves, or
a recipe triggered a save.

If you find that a single RabbitMQ / chef-expander stack is not meeting
your needs for some reason, we should work together to improve the
horizontal scalability of the chef-expander.

On Tue, Apr 29, 2014 at 11:11 PM, DV vindimy@gmail.com wrote:

Hmm, Stephen, when you say “indexable objects are stored on chef server”,
you mean there’s a call that comes in to the API (say, create new node)
that goes to erchef, which goes to chef-expander and rabbitmq? In that
case, one chef-expander and rabbitmq per erchef are appropriate, it seems,
as long as each erchef talks to its own chef-expander and rabbitmq.

Here’s how we’ve set up Chef 11 at my company:
https://www.dropbox.com/s/q41172dtrth4yw5/chef11_layout.png

The Web UI / API hosts have [chef-expander chef-server-webui erchef
rabbitmq nginx], Postgres/Bookshelf/Solr hosts are dedicated to their role.
Everything is set up with chef-server cookbook and custom roles, except
Postgres (since chef-server cookbook doesn’t allow master/slave config).
Bookshelf is replicated on filesystem level (slave is read-only until
replication is broken).

We’ve ran this for a few months and haven’t seen any issues yet.

On Thu, Apr 24, 2014 at 1:04 PM, Stephen Delano stephen@opscode.comwrote:

There should be some more crash logs from the console telling you what’s
going on with erchef, but you’re also going to have some other issues with
the setup you’ve described. If you’re running enough erchef servers, you
might want to check that you’re not exceeding the available connections of
the PostgreSQL server.

Multiple Bookshelfs:
Bookshelf was not designed to be run on multiple nodes. It has local
disk-based storage for the contents of your cookbooks.

Multiple Chef Expanders / RabbitMQ / Solr:
You also don’t want to run multiple search stacks. When indexable
objects are stored on the chef server, their contents are shuffled off to a
RabbitMQ queue for which there is a chef-expander listener that’s ready to
consume that data, “expand” it, and send it to Solr for indexing. First, if
you have multiple expanders as consumers to the rabbit queue, you’re
introducing the chance that the data is indexed out-of-order. This problem
is exacerbated when you start to add multiple RabbitMQs (which erchef talk
to which queues) and multiple Solrs (which erchefs and expanders talk to
which Solr).

On Thu, Apr 24, 2014 at 9:42 AM, Darío Ezequiel Nievas <
dario.nievas@mercadolibre.com> wrote:

Hi Guys,
I’m having a bit of a problem trying to scale erchef between several
nodes

First, let me give you guys an overview of my environment
-2 (there will be more) servers behind a load balancer, running the
following services:
-bookshelf
-chef-expander
-chef-server-webui
-erchef
-nginx

-2 servers behind a load balancer, runing these services:
-chef-solr
-rabbitmq

-a Postgresql cluster (using pgpool) for the chefdb

Now, the problem

I can’t seem to have erchef listening on port 8000 on both servers at
the same time. When erchef starts on one of the servers, it starts crashing
on the other one

=CRASH REPORT==== 24-Apr-2014::12:35:15 ===
crasher:
initial call: sqerl_client:init/1
pid: <0.131.0>
registered_name: []
exception exit: {stop,timeout}
in function gen_server:init_it/6 (gen_server.erl, line 320)
ancestors:
[<0.112.0>,pooler_pool_sup,pooler_sup,sqerl_sup,<0.107.0>]
messages: []
links: [<0.112.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 4181
stack_size: 24
reductions: 22425
neighbours:

=SUPERVISOR REPORT==== 24-Apr-2014::12:35:15 ===
Supervisor: {<0.112.0>,pooler_pooled_worker_sup}
Context: child_terminated
Reason: {stop,timeout}
Offender: [{pid,<0.131.0>},
{name,sqerl_client},
{mfargs,{sqerl_client,start_link,undefined}},
{restart_type,temporary},
{shutdown,brutal_kill},
{child_type,worker}]

-If I stop erchef on node 1, the crash reports stop, and erchef starts
listening on node2:8000
-Then, If I try to start erchef on node1, It won’t work, unless I stop
it on node2

Is there a way to avoid this, in order to be able to scale as many
erchef instances as needed?

Thanks in advance!

Dario Nievas (Snowie)
MercadoLibre Cloud Services
Arias 3751, Piso 7 (C1430CRG)
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 11-6370-6406
Tel : +54(11) 4640-8443


Stephen Delano
Software Development Engineer
Opscode, Inc.
1008 Western Avenue
Suite 601
Seattle, WA 98104


Best regards, Dmitriy V.


Stephen Delano
Software Development Engineer
Opscode, Inc.
1008 Western Avenue
Suite 601
Seattle, WA 98104


Best regards, Dmitriy V.