Hypertable cookbook


#1

Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed database
with better performance.

I am facing the way to start the cluster making that compatible with
chef-client.

Hypertable has 4 main components (hyperspace, master, slave, thriftbroker)
that need to be started sequentially.

  • Cluster roles have to be chef roles or just items in the run list?
  • How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a
cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi


#2

hi,
you cant do it easily with chef, not at least with standard chef components
and within a single chef run. Chef can not query or notify remote nodes in
realtime. As of now, i have used two approaches with different levels of
success (for setting up multi node clusters/systems that requires certain
steps to be in certain order ), and can not be converged completely without
the presence of certain other nodes.

  1. Keep all the installation and setup logic separate from the core cluster
    config resource. i.e. everything except the bare minimal configs required
    to start individual service. Install them in one phase, in parallel. These
    first phase run list should leave some footprint via attributes.
    In the second phase, alter the run lists of nodes, and add the config
    recipe (and that should start the main service). chef run invocation order
    will be exactly same as you do it manually . The config recipes should
    exploit the attributes (and search based on them) to figure out things.
    Once both of these are working , you can minimize the chef run intervals to
    set up things faster (i prefer to use a chef run after every 5 mins for the
    first couple of chef runs at least.

  2. you can also use something like flock of chefs (note, its highly
    experimental, required ruby 1.9 & celluloid etc) to do remote
    notifications. This is far more convenient, but also complex and errors can
    be difficult to debug. But you’ll be able to do things staying within the
    chef recipes, i.e. you can set up the nodes, and keep all the services
    stopped till the first dependency is resolved, and the first dependency can
    remote notify the second, second resource can remote notify the third etc.

Optionally, you can use ansible of mco like external relatime dispatching
systems also with the 1st setup.

my workflows are derived from build server setups has similar
characteristics (like settng up jenkins or Go or teamcity farms) as
persistence layer cluster solutions (like mysql replication, mongo
replicasets, cassandra clusters etc). But i am now working more on fail
over, which has similar challenges , but the solution requires much faster
reconfiguration. I have learned the above mentioned workflow does not work
for this. So, if you need to reconfigure your system within few seconds
(say in case of failover), this wont gonna work. Otherwise, if you can bare
some delay, then this is the simplest solution i could think of

Also , take a look at spiceweasel (https://github.com/mattray/spiceweasel),

best
ranjib

On Sun, Jul 14, 2013 at 4:40 PM, Jordi Llonch llonchj@gmail.com wrote:

Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed database
with better performance.

I am facing the way to start the cluster making that compatible with
chef-client.

Hypertable has 4 main components (hyperspace, master, slave, thriftbroker)
that need to be started sequentially.

  • Cluster roles have to be chef roles or just items in the run list?
  • How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a
cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi


#3

Thanks for your reply, Rangib,

Hypertable provides a capistrano file to install/start/stop the cluster.

What about to use chef to install & configure the nodes and leave the
start/stop tasks to the capistrano script?

Regards,

2013/7/15 Ranjib Dey dey.ranjib@gmail.com

hi,
you cant do it easily with chef, not at least with standard chef
components and within a single chef run. Chef can not query or notify
remote nodes in realtime. As of now, i have used two approaches with
different levels of success (for setting up multi node clusters/systems
that requires certain steps to be in certain order ), and can not be
converged completely without the presence of certain other nodes.

  1. Keep all the installation and setup logic separate from the core
    cluster config resource. i.e. everything except the bare minimal configs
    required to start individual service. Install them in one phase, in
    parallel. These first phase run list should leave some footprint via
    attributes.
    In the second phase, alter the run lists of nodes, and add the config
    recipe (and that should start the main service). chef run invocation order
    will be exactly same as you do it manually . The config recipes should
    exploit the attributes (and search based on them) to figure out things.
    Once both of these are working , you can minimize the chef run intervals to
    set up things faster (i prefer to use a chef run after every 5 mins for the
    first couple of chef runs at least.

  2. you can also use something like flock of chefs (note, its highly
    experimental, required ruby 1.9 & celluloid etc) to do remote
    notifications. This is far more convenient, but also complex and errors can
    be difficult to debug. But you’ll be able to do things staying within the
    chef recipes, i.e. you can set up the nodes, and keep all the services
    stopped till the first dependency is resolved, and the first dependency can
    remote notify the second, second resource can remote notify the third etc.

Optionally, you can use ansible of mco like external relatime dispatching
systems also with the 1st setup.

my workflows are derived from build server setups has similar
characteristics (like settng up jenkins or Go or teamcity farms) as
persistence layer cluster solutions (like mysql replication, mongo
replicasets, cassandra clusters etc). But i am now working more on fail
over, which has similar challenges , but the solution requires much faster
reconfiguration. I have learned the above mentioned workflow does not work
for this. So, if you need to reconfigure your system within few seconds
(say in case of failover), this wont gonna work. Otherwise, if you can bare
some delay, then this is the simplest solution i could think of

Also , take a look at spiceweasel (https://github.com/mattray/spiceweasel
),

best
ranjib

On Sun, Jul 14, 2013 at 4:40 PM, Jordi Llonch llonchj@gmail.com wrote:

Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed
database with better performance.

I am facing the way to start the cluster making that compatible with
chef-client.

Hypertable has 4 main components (hyperspace, master, slave,
thriftbroker) that need to be started sequentially.

  • Cluster roles have to be chef roles or just items in the run list?
  • How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a
cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi


#4

Perfect. Though i hate cap still i think its better to solve it there if
you can. :slight_smile: .
On Jul 14, 2013 5:27 PM, “Jordi Llonch” llonchj@gmail.com wrote:

Thanks for your reply, Rangib,

Hypertable provides a capistrano file to install/start/stop the cluster.

What about to use chef to install & configure the nodes and leave the
start/stop tasks to the capistrano script?

Regards,

2013/7/15 Ranjib Dey dey.ranjib@gmail.com

hi,
you cant do it easily with chef, not at least with standard chef
components and within a single chef run. Chef can not query or notify
remote nodes in realtime. As of now, i have used two approaches with
different levels of success (for setting up multi node clusters/systems
that requires certain steps to be in certain order ), and can not be
converged completely without the presence of certain other nodes.

  1. Keep all the installation and setup logic separate from the core
    cluster config resource. i.e. everything except the bare minimal configs
    required to start individual service. Install them in one phase, in
    parallel. These first phase run list should leave some footprint via
    attributes.
    In the second phase, alter the run lists of nodes, and add the config
    recipe (and that should start the main service). chef run invocation order
    will be exactly same as you do it manually . The config recipes should
    exploit the attributes (and search based on them) to figure out things.
    Once both of these are working , you can minimize the chef run intervals to
    set up things faster (i prefer to use a chef run after every 5 mins for the
    first couple of chef runs at least.

  2. you can also use something like flock of chefs (note, its highly
    experimental, required ruby 1.9 & celluloid etc) to do remote
    notifications. This is far more convenient, but also complex and errors can
    be difficult to debug. But you’ll be able to do things staying within the
    chef recipes, i.e. you can set up the nodes, and keep all the services
    stopped till the first dependency is resolved, and the first dependency can
    remote notify the second, second resource can remote notify the third etc.

Optionally, you can use ansible of mco like external relatime dispatching
systems also with the 1st setup.

my workflows are derived from build server setups has similar
characteristics (like settng up jenkins or Go or teamcity farms) as
persistence layer cluster solutions (like mysql replication, mongo
replicasets, cassandra clusters etc). But i am now working more on fail
over, which has similar challenges , but the solution requires much faster
reconfiguration. I have learned the above mentioned workflow does not work
for this. So, if you need to reconfigure your system within few seconds
(say in case of failover), this wont gonna work. Otherwise, if you can bare
some delay, then this is the simplest solution i could think of

Also , take a look at spiceweasel (https://github.com/mattray/spiceweasel
),

best
ranjib

On Sun, Jul 14, 2013 at 4:40 PM, Jordi Llonch llonchj@gmail.com wrote:

Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed
database with better performance.

I am facing the way to start the cluster making that compatible with
chef-client.

Hypertable has 4 main components (hyperspace, master, slave,
thriftbroker) that need to be started sequentially.

  • Cluster roles have to be chef roles or just items in the run list?
  • How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a
cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi


#5

Found a capistrano-chef(https://github.com/cramerdev/capistrano-chef)
package that can glue the stuff…

Thanks for your help and ideas.

2013/7/15 Ranjib Dey dey.ranjib@gmail.com

Perfect. Though i hate cap still i think its better to solve it there if
you can. :slight_smile: .
On Jul 14, 2013 5:27 PM, “Jordi Llonch” llonchj@gmail.com wrote:

Thanks for your reply, Rangib,

Hypertable provides a capistrano file to install/start/stop the cluster.

What about to use chef to install & configure the nodes and leave the
start/stop tasks to the capistrano script?

Regards,

2013/7/15 Ranjib Dey dey.ranjib@gmail.com

hi,
you cant do it easily with chef, not at least with standard chef
components and within a single chef run. Chef can not query or notify
remote nodes in realtime. As of now, i have used two approaches with
different levels of success (for setting up multi node clusters/systems
that requires certain steps to be in certain order ), and can not be
converged completely without the presence of certain other nodes.

  1. Keep all the installation and setup logic separate from the core
    cluster config resource. i.e. everything except the bare minimal configs
    required to start individual service. Install them in one phase, in
    parallel. These first phase run list should leave some footprint via
    attributes.
    In the second phase, alter the run lists of nodes, and add the config
    recipe (and that should start the main service). chef run invocation order
    will be exactly same as you do it manually . The config recipes should
    exploit the attributes (and search based on them) to figure out things.
    Once both of these are working , you can minimize the chef run intervals to
    set up things faster (i prefer to use a chef run after every 5 mins for the
    first couple of chef runs at least.

  2. you can also use something like flock of chefs (note, its highly
    experimental, required ruby 1.9 & celluloid etc) to do remote
    notifications. This is far more convenient, but also complex and errors can
    be difficult to debug. But you’ll be able to do things staying within the
    chef recipes, i.e. you can set up the nodes, and keep all the services
    stopped till the first dependency is resolved, and the first dependency can
    remote notify the second, second resource can remote notify the third etc.

Optionally, you can use ansible of mco like external relatime
dispatching systems also with the 1st setup.

my workflows are derived from build server setups has similar
characteristics (like settng up jenkins or Go or teamcity farms) as
persistence layer cluster solutions (like mysql replication, mongo
replicasets, cassandra clusters etc). But i am now working more on fail
over, which has similar challenges , but the solution requires much faster
reconfiguration. I have learned the above mentioned workflow does not work
for this. So, if you need to reconfigure your system within few seconds
(say in case of failover), this wont gonna work. Otherwise, if you can bare
some delay, then this is the simplest solution i could think of

Also , take a look at spiceweasel (
https://github.com/mattray/spiceweasel),

best
ranjib

On Sun, Jul 14, 2013 at 4:40 PM, Jordi Llonch llonchj@gmail.com wrote:

Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed
database with better performance.

I am facing the way to start the cluster making that compatible with
chef-client.

Hypertable has 4 main components (hyperspace, master, slave,
thriftbroker) that need to be started sequentially.

  • Cluster roles have to be chef roles or just items in the run list?
  • How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a
cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi


#6

$ capistrano".insert 1, “r”

:slight_smile:

On Sunday, July 14, 2013 at 5:50 PM, Jordi Llonch wrote:

Found a capistrano-chef(https://github.com/cramerdev/capistrano-chef) package that can glue the stuff…

Thanks for your help and ideas.

2013/7/15 Ranjib Dey <dey.ranjib@gmail.com (mailto:dey.ranjib@gmail.com)>

Perfect. Though i hate cap still i think its better to solve it there if you can. :slight_smile: .
On Jul 14, 2013 5:27 PM, “Jordi Llonch” <llonchj@gmail.com (mailto:llonchj@gmail.com)> wrote:

Thanks for your reply, Rangib,

Hypertable provides a capistrano file to install/start/stop the cluster.

What about to use chef to install & configure the nodes and leave the start/stop tasks to the capistrano script?

Regards,

2013/7/15 Ranjib Dey <dey.ranjib@gmail.com (mailto:dey.ranjib@gmail.com)>

hi,
you cant do it easily with chef, not at least with standard chef components and within a single chef run. Chef can not query or notify remote nodes in realtime. As of now, i have used two approaches with different levels of success (for setting up multi node clusters/systems that requires certain steps to be in certain order ), and can not be converged completely without the presence of certain other nodes.

  1. Keep all the installation and setup logic separate from the core cluster config resource. i.e. everything except the bare minimal configs required to start individual service. Install them in one phase, in parallel. These first phase run list should leave some footprint via attributes.
    In the second phase, alter the run lists of nodes, and add the config recipe (and that should start the main service). chef run invocation order will be exactly same as you do it manually . The config recipes should exploit the attributes (and search based on them) to figure out things. Once both of these are working , you can minimize the chef run intervals to set up things faster (i prefer to use a chef run after every 5 mins for the first couple of chef runs at least.

  2. you can also use something like flock of chefs (note, its highly experimental, required ruby 1.9 & celluloid etc) to do remote notifications. This is far more convenient, but also complex and errors can be difficult to debug. But you’ll be able to do things staying within the chef recipes, i.e. you can set up the nodes, and keep all the services stopped till the first dependency is resolved, and the first dependency can remote notify the second, second resource can remote notify the third etc.

Optionally, you can use ansible of mco like external relatime dispatching systems also with the 1st setup.

my workflows are derived from build server setups has similar characteristics (like settng up jenkins or Go or teamcity farms) as persistence layer cluster solutions (like mysql replication, mongo replicasets, cassandra clusters etc). But i am now working more on fail over, which has similar challenges , but the solution requires much faster reconfiguration. I have learned the above mentioned workflow does not work for this. So, if you need to reconfigure your system within few seconds (say in case of failover), this wont gonna work. Otherwise, if you can bare some delay, then this is the simplest solution i could think of

Also , take a look at spiceweasel (https://github.com/mattray/spiceweasel),

best
ranjib

On Sun, Jul 14, 2013 at 4:40 PM, Jordi Llonch <llonchj@gmail.com (mailto:llonchj@gmail.com)> wrote:

Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed database with better performance.

I am facing the way to start the cluster making that compatible with chef-client.

Hypertable has 4 main components (hyperspace, master, slave, thriftbroker) that need to be started sequentially.

  • Cluster roles have to be chef roles or just items in the run list?
  • How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi


#7

I helped write capistrano-chef, and while I’m not actively using it any more, I would be happy to help with any questions or improvements and merge any pull requests that come along.

Thanks,

Nathan L Smith
smith@opscode.com
(319) 339-0466

From: Jordi Llonch <llonchj@gmail.commailto:llonchj@gmail.com>
Reply-To: "chef@lists.opscode.commailto:chef@lists.opscode.com" <chef@lists.opscode.commailto:chef@lists.opscode.com>
Date: Sunday, July 14, 2013 7:50 PM
To: "chef@lists.opscode.commailto:chef@lists.opscode.com" <chef@lists.opscode.commailto:chef@lists.opscode.com>
Subject: [chef] Re: Re: Re: Re: Hypertable cookbook

Found a capistrano-chef(https://github.com/cramerdev/capistrano-chef) package that can glue the stuff…

Thanks for your help and ideas.

2013/7/15 Ranjib Dey <dey.ranjib@gmail.commailto:dey.ranjib@gmail.com>

Perfect. Though i hate cap still i think its better to solve it there if you can. :slight_smile: .

On Jul 14, 2013 5:27 PM, “Jordi Llonch” <llonchj@gmail.commailto:llonchj@gmail.com> wrote:
Thanks for your reply, Rangib,

Hypertable provides a capistrano file to install/start/stop the cluster.

What about to use chef to install & configure the nodes and leave the start/stop tasks to the capistrano script?

Regards,

2013/7/15 Ranjib Dey <dey.ranjib@gmail.commailto:dey.ranjib@gmail.com>
hi,
you cant do it easily with chef, not at least with standard chef components and within a single chef run. Chef can not query or notify remote nodes in realtime. As of now, i have used two approaches with different levels of success (for setting up multi node clusters/systems that requires certain steps to be in certain order ), and can not be converged completely without the presence of certain other nodes.

  1. Keep all the installation and setup logic separate from the core cluster config resource. i.e. everything except the bare minimal configs required to start individual service. Install them in one phase, in parallel. These first phase run list should leave some footprint via attributes.
    In the second phase, alter the run lists of nodes, and add the config recipe (and that should start the main service). chef run invocation order will be exactly same as you do it manually . The config recipes should exploit the attributes (and search based on them) to figure out things. Once both of these are working , you can minimize the chef run intervals to set up things faster (i prefer to use a chef run after every 5 mins for the first couple of chef runs at least.

  2. you can also use something like flock of chefs (note, its highly experimental, required ruby 1.9 & celluloid etc) to do remote notifications. This is far more convenient, but also complex and errors can be difficult to debug. But you’ll be able to do things staying within the chef recipes, i.e. you can set up the nodes, and keep all the services stopped till the first dependency is resolved, and the first dependency can remote notify the second, second resource can remote notify the third etc.

Optionally, you can use ansible of mco like external relatime dispatching systems also with the 1st setup.

my workflows are derived from build server setups has similar characteristics (like settng up jenkins or Go or teamcity farms) as persistence layer cluster solutions (like mysql replication, mongo replicasets, cassandra clusters etc). But i am now working more on fail over, which has similar challenges , but the solution requires much faster reconfiguration. I have learned the above mentioned workflow does not work for this. So, if you need to reconfigure your system within few seconds (say in case of failover), this wont gonna work. Otherwise, if you can bare some delay, then this is the simplest solution i could think of

Also , take a look at spiceweasel (https://github.com/mattray/spiceweasel),

best
ranjib

On Sun, Jul 14, 2013 at 4:40 PM, Jordi Llonch <llonchj@gmail.commailto:llonchj@gmail.com> wrote:
Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed database with better performance.

I am facing the way to start the cluster making that compatible with chef-client.

Hypertable has 4 main components (hyperspace, master, slave, thriftbroker) that need to be started sequentially.

  • Cluster roles have to be chef roles or just items in the run list?
  • How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi


#8

Thanks Nathan, capistrano-chef it’s amazing!

I have a working Capfile that gets the roles from chef.

A mixed solution between chef and capistrano looks great for the cookbook
and deals well with the whole cluster. I will publish soon the hypertable
cookbook to opscode cookbook repository.

Thanks for your help.

2013/7/15 Nathan Smith smith@opscode.com

I helped write capistrano-chef, and while I’m not actively using it any
more, I would be happy to help with any questions or improvements and merge
any pull requests that come along.

Thanks,

Nathan L Smith
smith@opscode.com
(319) 339-0466

From: Jordi Llonch llonchj@gmail.com
Reply-To: "chef@lists.opscode.com" chef@lists.opscode.com
Date: Sunday, July 14, 2013 7:50 PM
To: "chef@lists.opscode.com" chef@lists.opscode.com
Subject: [chef] Re: Re: Re: Re: Hypertable cookbook

Found a capistrano-chef(https://github.com/cramerdev/capistrano-chef)
package that can glue the stuff…

Thanks for your help and ideas.

2013/7/15 Ranjib Dey dey.ranjib@gmail.com

Perfect. Though i hate cap still i think its better to solve it there if
you can. :slight_smile: .
On Jul 14, 2013 5:27 PM, “Jordi Llonch” llonchj@gmail.com wrote:

Thanks for your reply, Rangib,

Hypertable provides a capistrano file to install/start/stop the
cluster.

What about to use chef to install & configure the nodes and leave the
start/stop tasks to the capistrano script?

Regards,

2013/7/15 Ranjib Dey dey.ranjib@gmail.com

hi,
you cant do it easily with chef, not at least with standard chef
components and within a single chef run. Chef can not query or notify
remote nodes in realtime. As of now, i have used two approaches with
different levels of success (for setting up multi node clusters/systems
that requires certain steps to be in certain order ), and can not be
converged completely without the presence of certain other nodes.

  1. Keep all the installation and setup logic separate from the core
    cluster config resource. i.e. everything except the bare minimal configs
    required to start individual service. Install them in one phase, in
    parallel. These first phase run list should leave some footprint via
    attributes.
    In the second phase, alter the run lists of nodes, and add the config
    recipe (and that should start the main service). chef run invocation order
    will be exactly same as you do it manually . The config recipes should
    exploit the attributes (and search based on them) to figure out things.
    Once both of these are working , you can minimize the chef run intervals to
    set up things faster (i prefer to use a chef run after every 5 mins for the
    first couple of chef runs at least.

  2. you can also use something like flock of chefs (note, its highly
    experimental, required ruby 1.9 & celluloid etc) to do remote
    notifications. This is far more convenient, but also complex and errors can
    be difficult to debug. But you’ll be able to do things staying within the
    chef recipes, i.e. you can set up the nodes, and keep all the services
    stopped till the first dependency is resolved, and the first dependency can
    remote notify the second, second resource can remote notify the third etc.

Optionally, you can use ansible of mco like external relatime
dispatching systems also with the 1st setup.

my workflows are derived from build server setups has similar
characteristics (like settng up jenkins or Go or teamcity farms) as
persistence layer cluster solutions (like mysql replication, mongo
replicasets, cassandra clusters etc). But i am now working more on fail
over, which has similar challenges , but the solution requires much faster
reconfiguration. I have learned the above mentioned workflow does not work
for this. So, if you need to reconfigure your system within few seconds
(say in case of failover), this wont gonna work. Otherwise, if you can bare
some delay, then this is the simplest solution i could think of

Also , take a look at spiceweasel (
https://github.com/mattray/spiceweasel),

best
ranjib

On Sun, Jul 14, 2013 at 4:40 PM, Jordi Llonch llonchj@gmail.comwrote:

Hi,

I am creating a cookbook for Hypertable, a HBase-like distributed
database with better performance.

I am facing the way to start the cluster making that compatible with
chef-client.

Hypertable has 4 main components (hyperspace, master, slave,
thriftbroker) that need to be started sequentially.

  • Cluster roles have to be chef roles or just items in the run list?
  • How to start same service in different nodes at the same time?

What are the best practices to perform a cluster operation in a
cookbook/chef? I will appreciate your ideas or suggestions?

Thanks,
Jordi