Hypertable cookbook

Jordi_Llonch · July 14, 2013, 11:40pm

Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed database
with better performance.

I am facing the way to start the cluster making that compatible with
chef-client.

Hypertable has 4 main components (hyperspace, master, slave, thriftbroker)
that need to be started sequentially.

Cluster roles have to be chef roles or just items in the run list?
How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a
cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi

Ranjib · July 15, 2013, 12:08am

hi,
you cant do it easily with chef, not at least with standard chef components
and within a single chef run. Chef can not query or notify remote nodes in
realtime. As of now, i have used two approaches with different levels of
success (for setting up multi node clusters/systems that requires certain
steps to be in certain order ), and can not be converged completely without
the presence of certain other nodes.

Keep all the installation and setup logic separate from the core cluster
config resource. i.e. everything except the bare minimal configs required
to start individual service. Install them in one phase, in parallel. These
first phase run list should leave some footprint via attributes.
In the second phase, alter the run lists of nodes, and add the config
recipe (and that should start the main service). chef run invocation order
will be exactly same as you do it manually . The config recipes should
exploit the attributes (and search based on them) to figure out things.
Once both of these are working , you can minimize the chef run intervals to
set up things faster (i prefer to use a chef run after every 5 mins for the
first couple of chef runs at least.
you can also use something like flock of chefs (note, its highly
experimental, required ruby 1.9 & celluloid etc) to do remote
notifications. This is far more convenient, but also complex and errors can
be difficult to debug. But you'll be able to do things staying within the
chef recipes, i.e. you can set up the nodes, and keep all the services
stopped till the first dependency is resolved, and the first dependency can
remote notify the second, second resource can remote notify the third etc.

Optionally, you can use ansible of mco like external relatime dispatching
systems also with the 1st setup.

my workflows are derived from build server setups has similar
characteristics (like settng up jenkins or Go or teamcity farms) as
persistence layer cluster solutions (like mysql replication, mongo
replicasets, cassandra clusters etc). But i am now working more on fail
over, which has similar challenges , but the solution requires much faster
reconfiguration. I have learned the above mentioned workflow does not work
for this. So, if you need to reconfigure your system within few seconds
(say in case of failover), this wont gonna work. Otherwise, if you can bare
some delay, then this is the simplest solution i could think of

Also , take a look at spiceweasel (GitHub - mattray/spiceweasel: Generates Chef knife commands from a simple JSON or YAML file.),

best
ranjib

On Sun, Jul 14, 2013 at 4:40 PM, Jordi Llonch llonchj@gmail.com wrote:

Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed database
with better performance.

I am facing the way to start the cluster making that compatible with
chef-client.

Hypertable has 4 main components (hyperspace, master, slave, thriftbroker)
that need to be started sequentially.

Cluster roles have to be chef roles or just items in the run list?

How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a
cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi

Jordi_Llonch · July 15, 2013, 12:26am

Thanks for your reply, Rangib,

Hypertable provides a capistrano file to install/start/stop the cluster.

What about to use chef to install & configure the nodes and leave the
start/stop tasks to the capistrano script?

Regards,

2013/7/15 Ranjib Dey dey.ranjib@gmail.com

hi,
you cant do it easily with chef, not at least with standard chef
components and within a single chef run. Chef can not query or notify
remote nodes in realtime. As of now, i have used two approaches with
different levels of success (for setting up multi node clusters/systems
that requires certain steps to be in certain order ), and can not be
converged completely without the presence of certain other nodes.

Keep all the installation and setup logic separate from the core
cluster config resource. i.e. everything except the bare minimal configs
required to start individual service. Install them in one phase, in
parallel. These first phase run list should leave some footprint via
attributes.
In the second phase, alter the run lists of nodes, and add the config
recipe (and that should start the main service). chef run invocation order
will be exactly same as you do it manually . The config recipes should
exploit the attributes (and search based on them) to figure out things.
Once both of these are working , you can minimize the chef run intervals to
set up things faster (i prefer to use a chef run after every 5 mins for the
first couple of chef runs at least.

you can also use something like flock of chefs (note, its highly
experimental, required ruby 1.9 & celluloid etc) to do remote
notifications. This is far more convenient, but also complex and errors can
be difficult to debug. But you'll be able to do things staying within the
chef recipes, i.e. you can set up the nodes, and keep all the services
stopped till the first dependency is resolved, and the first dependency can
remote notify the second, second resource can remote notify the third etc.

Optionally, you can use ansible of mco like external relatime dispatching
systems also with the 1st setup.

my workflows are derived from build server setups has similar
characteristics (like settng up jenkins or Go or teamcity farms) as
persistence layer cluster solutions (like mysql replication, mongo
replicasets, cassandra clusters etc). But i am now working more on fail
over, which has similar challenges , but the solution requires much faster
reconfiguration. I have learned the above mentioned workflow does not work
for this. So, if you need to reconfigure your system within few seconds
(say in case of failover), this wont gonna work. Otherwise, if you can bare
some delay, then this is the simplest solution i could think of

Also , take a look at spiceweasel (GitHub - mattray/spiceweasel: Generates Chef knife commands from a simple JSON or YAML file.
),

best
ranjib

On Sun, Jul 14, 2013 at 4:40 PM, Jordi Llonch llonchj@gmail.com wrote:

Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed
database with better performance.

I am facing the way to start the cluster making that compatible with
chef-client.

Hypertable has 4 main components (hyperspace, master, slave,
thriftbroker) that need to be started sequentially.

Cluster roles have to be chef roles or just items in the run list?

How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a
cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi

Ranjib · July 15, 2013, 12:33am

Perfect. Though i hate cap still i think its better to solve it there if
you can. .
On Jul 14, 2013 5:27 PM, "Jordi Llonch" llonchj@gmail.com wrote:

Thanks for your reply, Rangib,

Hypertable provides a capistrano file to install/start/stop the cluster.

What about to use chef to install & configure the nodes and leave the
start/stop tasks to the capistrano script?

Regards,

2013/7/15 Ranjib Dey dey.ranjib@gmail.com

hi,
you cant do it easily with chef, not at least with standard chef
components and within a single chef run. Chef can not query or notify
remote nodes in realtime. As of now, i have used two approaches with
different levels of success (for setting up multi node clusters/systems
that requires certain steps to be in certain order ), and can not be
converged completely without the presence of certain other nodes.

Keep all the installation and setup logic separate from the core
cluster config resource. i.e. everything except the bare minimal configs
required to start individual service. Install them in one phase, in
parallel. These first phase run list should leave some footprint via
attributes.
In the second phase, alter the run lists of nodes, and add the config
recipe (and that should start the main service). chef run invocation order
will be exactly same as you do it manually . The config recipes should
exploit the attributes (and search based on them) to figure out things.
Once both of these are working , you can minimize the chef run intervals to
set up things faster (i prefer to use a chef run after every 5 mins for the
first couple of chef runs at least.

you can also use something like flock of chefs (note, its highly
experimental, required ruby 1.9 & celluloid etc) to do remote
notifications. This is far more convenient, but also complex and errors can
be difficult to debug. But you'll be able to do things staying within the
chef recipes, i.e. you can set up the nodes, and keep all the services
stopped till the first dependency is resolved, and the first dependency can
remote notify the second, second resource can remote notify the third etc.

Optionally, you can use ansible of mco like external relatime dispatching
systems also with the 1st setup.

my workflows are derived from build server setups has similar
characteristics (like settng up jenkins or Go or teamcity farms) as
persistence layer cluster solutions (like mysql replication, mongo
replicasets, cassandra clusters etc). But i am now working more on fail
over, which has similar challenges , but the solution requires much faster
reconfiguration. I have learned the above mentioned workflow does not work
for this. So, if you need to reconfigure your system within few seconds
(say in case of failover), this wont gonna work. Otherwise, if you can bare
some delay, then this is the simplest solution i could think of

Also , take a look at spiceweasel (GitHub - mattray/spiceweasel: Generates Chef knife commands from a simple JSON or YAML file.
),

best
ranjib

On Sun, Jul 14, 2013 at 4:40 PM, Jordi Llonch llonchj@gmail.com wrote:

Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed
database with better performance.

I am facing the way to start the cluster making that compatible with
chef-client.

Hypertable has 4 main components (hyperspace, master, slave,
thriftbroker) that need to be started sequentially.

Cluster roles have to be chef roles or just items in the run list?

How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a
cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi

Jordi_Llonch · July 15, 2013, 12:50am

Found a capistrano-chef(GitHub - gofullstack/capistrano-chef: Capistrano extensions for Chef integration)
package that can glue the stuff...

Thanks for your help and ideas.

2013/7/15 Ranjib Dey dey.ranjib@gmail.com

Perfect. Though i hate cap still i think its better to solve it there if
you can. .
On Jul 14, 2013 5:27 PM, "Jordi Llonch" llonchj@gmail.com wrote:

Thanks for your reply, Rangib,

Hypertable provides a capistrano file to install/start/stop the cluster.

What about to use chef to install & configure the nodes and leave the
start/stop tasks to the capistrano script?

Regards,

2013/7/15 Ranjib Dey dey.ranjib@gmail.com

hi,
you cant do it easily with chef, not at least with standard chef
components and within a single chef run. Chef can not query or notify
remote nodes in realtime. As of now, i have used two approaches with
different levels of success (for setting up multi node clusters/systems
that requires certain steps to be in certain order ), and can not be
converged completely without the presence of certain other nodes.

Keep all the installation and setup logic separate from the core
cluster config resource. i.e. everything except the bare minimal configs
required to start individual service. Install them in one phase, in
parallel. These first phase run list should leave some footprint via
attributes.
In the second phase, alter the run lists of nodes, and add the config
recipe (and that should start the main service). chef run invocation order
will be exactly same as you do it manually . The config recipes should
exploit the attributes (and search based on them) to figure out things.
Once both of these are working , you can minimize the chef run intervals to
set up things faster (i prefer to use a chef run after every 5 mins for the
first couple of chef runs at least.

you can also use something like flock of chefs (note, its highly
experimental, required ruby 1.9 & celluloid etc) to do remote
notifications. This is far more convenient, but also complex and errors can
be difficult to debug. But you'll be able to do things staying within the
chef recipes, i.e. you can set up the nodes, and keep all the services
stopped till the first dependency is resolved, and the first dependency can
remote notify the second, second resource can remote notify the third etc.

Optionally, you can use ansible of mco like external relatime
dispatching systems also with the 1st setup.

my workflows are derived from build server setups has similar
characteristics (like settng up jenkins or Go or teamcity farms) as
persistence layer cluster solutions (like mysql replication, mongo
replicasets, cassandra clusters etc). But i am now working more on fail
over, which has similar challenges , but the solution requires much faster
reconfiguration. I have learned the above mentioned workflow does not work
for this. So, if you need to reconfigure your system within few seconds
(say in case of failover), this wont gonna work. Otherwise, if you can bare
some delay, then this is the simplest solution i could think of

Also , take a look at spiceweasel (
GitHub - mattray/spiceweasel: Generates Chef knife commands from a simple JSON or YAML file.),

best
ranjib

On Sun, Jul 14, 2013 at 4:40 PM, Jordi Llonch llonchj@gmail.com wrote:

Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed
database with better performance.

I am facing the way to start the cluster making that compatible with
chef-client.

Hypertable has 4 main components (hyperspace, master, slave,
thriftbroker) that need to be started sequentially.

Cluster roles have to be chef roles or just items in the run list?

How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a
cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi

John_Dewey · July 15, 2013, 3:15am

$ capistrano".insert 1, "r"

On Sunday, July 14, 2013 at 5:50 PM, Jordi Llonch wrote:

Found a capistrano-chef(GitHub - gofullstack/capistrano-chef: Capistrano extensions for Chef integration) package that can glue the stuff...

Thanks for your help and ideas.

2013/7/15 Ranjib Dey <dey.ranjib@gmail.com (mailto:dey.ranjib@gmail.com)>

Perfect. Though i hate cap still i think its better to solve it there if you can. .
On Jul 14, 2013 5:27 PM, "Jordi Llonch" <llonchj@gmail.com (mailto:llonchj@gmail.com)> wrote:

Thanks for your reply, Rangib,

Hypertable provides a capistrano file to install/start/stop the cluster.

What about to use chef to install & configure the nodes and leave the start/stop tasks to the capistrano script?

Regards,

2013/7/15 Ranjib Dey <dey.ranjib@gmail.com (mailto:dey.ranjib@gmail.com)>

hi,
you cant do it easily with chef, not at least with standard chef components and within a single chef run. Chef can not query or notify remote nodes in realtime. As of now, i have used two approaches with different levels of success (for setting up multi node clusters/systems that requires certain steps to be in certain order ), and can not be converged completely without the presence of certain other nodes.

Keep all the installation and setup logic separate from the core cluster config resource. i.e. everything except the bare minimal configs required to start individual service. Install them in one phase, in parallel. These first phase run list should leave some footprint via attributes.
In the second phase, alter the run lists of nodes, and add the config recipe (and that should start the main service). chef run invocation order will be exactly same as you do it manually . The config recipes should exploit the attributes (and search based on them) to figure out things. Once both of these are working , you can minimize the chef run intervals to set up things faster (i prefer to use a chef run after every 5 mins for the first couple of chef runs at least.

you can also use something like flock of chefs (note, its highly experimental, required ruby 1.9 & celluloid etc) to do remote notifications. This is far more convenient, but also complex and errors can be difficult to debug. But you'll be able to do things staying within the chef recipes, i.e. you can set up the nodes, and keep all the services stopped till the first dependency is resolved, and the first dependency can remote notify the second, second resource can remote notify the third etc.

Optionally, you can use ansible of mco like external relatime dispatching systems also with the 1st setup.

my workflows are derived from build server setups has similar characteristics (like settng up jenkins or Go or teamcity farms) as persistence layer cluster solutions (like mysql replication, mongo replicasets, cassandra clusters etc). But i am now working more on fail over, which has similar challenges , but the solution requires much faster reconfiguration. I have learned the above mentioned workflow does not work for this. So, if you need to reconfigure your system within few seconds (say in case of failover), this wont gonna work. Otherwise, if you can bare some delay, then this is the simplest solution i could think of

Also , take a look at spiceweasel (GitHub - mattray/spiceweasel: Generates Chef knife commands from a simple JSON or YAML file.),

best
ranjib

On Sun, Jul 14, 2013 at 4:40 PM, Jordi Llonch <llonchj@gmail.com (mailto:llonchj@gmail.com)> wrote:

Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed database with better performance.

I am facing the way to start the cluster making that compatible with chef-client.

Hypertable has 4 main components (hyperspace, master, slave, thriftbroker) that need to be started sequentially.

Cluster roles have to be chef roles or just items in the run list?

How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi

Nathan_Smith · July 15, 2013, 3:49am

I helped write capistrano-chef, and while I’m not actively using it any more, I would be happy to help with any questions or improvements and merge any pull requests that come along.

Thanks,

Nathan L Smith
smith@opscode.com
(319) 339-0466

From: Jordi Llonch <llonchj@gmail.com mailto:llonchj@gmail.com>
Reply-To: "chef@lists.opscode.com mailto:chef@lists.opscode.com" <chef@lists.opscode.com mailto:chef@lists.opscode.com>
Date: Sunday, July 14, 2013 7:50 PM
To: "chef@lists.opscode.com mailto:chef@lists.opscode.com" <chef@lists.opscode.com mailto:chef@lists.opscode.com>
Subject: [chef] Re: Re: Re: Re: Hypertable cookbook

Found a capistrano-chef(https://github.com/cramerdev/capistrano-chef) package that can glue the stuff…

Thanks for your help and ideas.

2013/7/15 Ranjib Dey <dey.ranjib@gmail.com mailto:dey.ranjib@gmail.com>

Perfect. Though i hate cap still i think its better to solve it there if you can. .

On Jul 14, 2013 5:27 PM, “Jordi Llonch” <llonchj@gmail.com mailto:llonchj@gmail.com> wrote:
Thanks for your reply, Rangib,

Hypertable provides a capistrano file to install/start/stop the cluster.

What about to use chef to install & configure the nodes and leave the start/stop tasks to the capistrano script?

Regards,

2013/7/15 Ranjib Dey <dey.ranjib@gmail.com mailto:dey.ranjib@gmail.com>
hi,
you cant do it easily with chef, not at least with standard chef components and within a single chef run. Chef can not query or notify remote nodes in realtime. As of now, i have used two approaches with different levels of success (for setting up multi node clusters/systems that requires certain steps to be in certain order ), and can not be converged completely without the presence of certain other nodes.

Keep all the installation and setup logic separate from the core cluster config resource. i.e. everything except the bare minimal configs required to start individual service. Install them in one phase, in parallel. These first phase run list should leave some footprint via attributes.
In the second phase, alter the run lists of nodes, and add the config recipe (and that should start the main service). chef run invocation order will be exactly same as you do it manually . The config recipes should exploit the attributes (and search based on them) to figure out things. Once both of these are working , you can minimize the chef run intervals to set up things faster (i prefer to use a chef run after every 5 mins for the first couple of chef runs at least.
you can also use something like flock of chefs (note, its highly experimental, required ruby 1.9 & celluloid etc) to do remote notifications. This is far more convenient, but also complex and errors can be difficult to debug. But you’ll be able to do things staying within the chef recipes, i.e. you can set up the nodes, and keep all the services stopped till the first dependency is resolved, and the first dependency can remote notify the second, second resource can remote notify the third etc.

Optionally, you can use ansible of mco like external relatime dispatching systems also with the 1st setup.

my workflows are derived from build server setups has similar characteristics (like settng up jenkins or Go or teamcity farms) as persistence layer cluster solutions (like mysql replication, mongo replicasets, cassandra clusters etc). But i am now working more on fail over, which has similar challenges , but the solution requires much faster reconfiguration. I have learned the above mentioned workflow does not work for this. So, if you need to reconfigure your system within few seconds (say in case of failover), this wont gonna work. Otherwise, if you can bare some delay, then this is the simplest solution i could think of

Also , take a look at spiceweasel (https://github.com/mattray/spiceweasel),

best
ranjib

On Sun, Jul 14, 2013 at 4:40 PM, Jordi Llonch <llonchj@gmail.com mailto:llonchj@gmail.com> wrote:
Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed database with better performance.

I am facing the way to start the cluster making that compatible with chef-client.

Hypertable has 4 main components (hyperspace, master, slave, thriftbroker) that need to be started sequentially.

Cluster roles have to be chef roles or just items in the run list?
How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi

Jordi_Llonch · July 15, 2013, 4:34am

Thanks Nathan, capistrano-chef it's amazing!

I have a working Capfile that gets the roles from chef.

A mixed solution between chef and capistrano looks great for the cookbook
and deals well with the whole cluster. I will publish soon the hypertable
cookbook to opscode cookbook repository.

Thanks for your help.

2013/7/15 Nathan Smith smith@opscode.com

I helped write capistrano-chef, and while I'm not actively using it any
more, I would be happy to help with any questions or improvements and merge
any pull requests that come along.

Thanks,

Nathan L Smith
smith@opscode.com
(319) 339-0466

From: Jordi Llonch llonchj@gmail.com
Reply-To: "chef@lists.opscode.com" chef@lists.opscode.com
Date: Sunday, July 14, 2013 7:50 PM
To: "chef@lists.opscode.com" chef@lists.opscode.com
Subject: [chef] Re: Re: Re: Re: Hypertable cookbook

Found a capistrano-chef(GitHub - gofullstack/capistrano-chef: Capistrano extensions for Chef integration)
package that can glue the stuff...

Thanks for your help and ideas.

2013/7/15 Ranjib Dey dey.ranjib@gmail.com

Perfect. Though i hate cap still i think its better to solve it there if
you can. .
On Jul 14, 2013 5:27 PM, "Jordi Llonch" llonchj@gmail.com wrote:

Thanks for your reply, Rangib,

Hypertable provides a capistrano file to install/start/stop the
cluster.

What about to use chef to install & configure the nodes and leave the
start/stop tasks to the capistrano script?

Regards,

2013/7/15 Ranjib Dey dey.ranjib@gmail.com

hi,
you cant do it easily with chef, not at least with standard chef
components and within a single chef run. Chef can not query or notify
remote nodes in realtime. As of now, i have used two approaches with
different levels of success (for setting up multi node clusters/systems
that requires certain steps to be in certain order ), and can not be
converged completely without the presence of certain other nodes.

Keep all the installation and setup logic separate from the core
cluster config resource. i.e. everything except the bare minimal configs
required to start individual service. Install them in one phase, in
parallel. These first phase run list should leave some footprint via
attributes.
In the second phase, alter the run lists of nodes, and add the config
recipe (and that should start the main service). chef run invocation order
will be exactly same as you do it manually . The config recipes should
exploit the attributes (and search based on them) to figure out things.
Once both of these are working , you can minimize the chef run intervals to
set up things faster (i prefer to use a chef run after every 5 mins for the
first couple of chef runs at least.

you can also use something like flock of chefs (note, its highly
experimental, required ruby 1.9 & celluloid etc) to do remote
notifications. This is far more convenient, but also complex and errors can
be difficult to debug. But you'll be able to do things staying within the
chef recipes, i.e. you can set up the nodes, and keep all the services
stopped till the first dependency is resolved, and the first dependency can
remote notify the second, second resource can remote notify the third etc.

Optionally, you can use ansible of mco like external relatime
dispatching systems also with the 1st setup.

my workflows are derived from build server setups has similar
characteristics (like settng up jenkins or Go or teamcity farms) as
persistence layer cluster solutions (like mysql replication, mongo
replicasets, cassandra clusters etc). But i am now working more on fail
over, which has similar challenges , but the solution requires much faster
reconfiguration. I have learned the above mentioned workflow does not work
for this. So, if you need to reconfigure your system within few seconds
(say in case of failover), this wont gonna work. Otherwise, if you can bare
some delay, then this is the simplest solution i could think of

Also , take a look at spiceweasel (
GitHub - mattray/spiceweasel: Generates Chef knife commands from a simple JSON or YAML file.),

best
ranjib

On Sun, Jul 14, 2013 at 4:40 PM, Jordi Llonch llonchj@gmail.comwrote:

Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed
database with better performance.

I am facing the way to start the cluster making that compatible with
chef-client.

Hypertable has 4 main components (hyperspace, master, slave,
thriftbroker) that need to be started sequentially.

Cluster roles have to be chef roles or just items in the run list?

How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a
cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi

Topic		Replies	Views
Hypertable cookbook Chef Infra (archive)	0	198	July 15, 2013
Run cookbooks on multiple nodes but limited groups at a time Chef Infra (archive)	2	454	April 10, 2019
Chef orchestration tools Chef Infra (archive)	6	550	April 24, 2018
How to run a single cookbook on all nodes connected to the chef server at once Chef Infra (archive)	1	310	May 28, 2019
Apply cookbooks on multiple nodes at same time Chef Infra (archive)	2	906	October 29, 2014

Hypertable cookbook

Related topics