Ideas on testing a clustered application w/ Test-Kitchen

I have some ideas on extending test-kitchen to test clustered
applications and I would love some feedback before I go coding off in
a particular direction.

Problem: I deal primarily with distributed applications and testing
the related cookbooks can be a pain. I also have to make sure these
cookbooks work across different linux distros. Test-Kitchen was not
originally created with this use case in mind though at this time I
don’t see any reason it couldn’t support this use case.

Vagabond[1], written by Chris Roberts, extends .kitchen.yml to include
a clusters component, among other things. I would love to see
test-kitchen absorb some of that functionality or at least provide
extension points to make this more easily pluggable.

Testing a cluster works differently than testing an individual node. I
want to interrogate the state of the cluster as a whole, not look
inside each individual server. To do this I need to wait until all
servers in a cluster converge or at least a quorum of them do. Once a
quorum of nodes has converged, run a series of tests against the the
cluster. These tests execute on the client executing kitchen rather
than inside the nodes of the cluster.

Here are the steps in brief:

  1. Converge all nodes in a cluster
  2. Wait for quorum of nodes to converge
  3. execute tests against the cluster

I would put the tests for a cluster in my_cluster/test/cluster/cluster_name

Applications like Elasticsearch, Zookeeper, or Cassandra don’t have a
master node, so each node has an identical run_list and attribute set

clusters:
default:

  • member: zk1
  • member: zk2
  • member: zk3

platforms:

  • name: ubuntu-12.04
  • name: centos-6.3

suites:

  • name: default
    run_list: [ “recipe[zookeeper]” ]

To test the default cluster on CentOS, kitchen test --cluster default --platform centos-6.3

Let’s make this even DRYer

clusters:
default:
node_count: 3
quorum: 2

platforms:

  • name: ubuntu-12.04
  • name: centos-6.3

suites:

  • name: default
    run_list: [ “recipe[zookeeper]” ]

The test for this example zookeeper cluster would connect to one
zookeeper node, make sure it sees the other zookeeper nodes. I am
paraphrasing the zookeeper code because I am not certain of what the
actual call would be.

my_cluster/test/cluster/default/check_members.rb

require ‘zk’

describe “zookeeper cluster” do
before(:all) do
@zk = ZK.new(some_ip)
end

it “sees the other members” do
peers = @zk.get(“system/peers”)
peers.should == ACTUAL_PEERS
end

end

The primary challenge here is resolving the names of the members of
the cluster. One way to do this is to access the statefiles for the
nodes in .kitchen/*.yml. Another would be to somehow access the
@instances array for the current Kitchen config. Yet another option
would be to stand up a chef-zero server.

What about a distributed app where each node does not have an
identical run_list? Here is how I would handle that for something like
HBase that has a “hmaster” that stores metadata for the whole cluster.

clusters:
default:

  • member: head1
    run_list: [ “recipe[hbase::hmaster]” ]
  • member: store1
    run_list: [ “recipe[hbase::data_node]” ]
  • member: store2
    run_list: [ “recipe[hbase::data_node]” ]

platforms:

  • name: ubuntu-12.04
  • name: centos-6.3

suites:

  • name: default
    run_list: [ “recipe[zookeeper]” ]

I know that test-kitchen is an unopinionated tool and attempts to be
workflow agnostic but I feel that what I have shown here is a fairly
simple workflow. The cluster definition I have presented is not
suitable for modeling failover in a cluster or specific states. At
that point one should use a custom workflow tool like chef-workflow or
simply custom rake tasks.

Depending on the feedback I get from this I plan to extend
test-kitchen or Vagabond to handle the workflow I have described here.
Thanks for reading!

  1. https://github.com/chrisroberts/vagabond/tree/develop

Would also be nice if the runner doesn't use chef-solo. I like where Vegabond is
going. The ability to truly do integration testing by spinning up dependent systems/services,
and test the cookbook in isolation is needed.

I have abandoned test kitchen all together in favor of chefspec, since I can stub/mock
dependencies that chef-solo cannot resolve. Looking forward to you work, so we can
add test kitchen to the openstack cookbooks, and get those much needed integration
tests.

For example, I have a use-cases were a cookbook depends on rabbit, mysql, memcached to be
running, and those roles registered into the chef server. At that point the prerequisites
are met and the cookbook can install recipes, and perform assertions. This is where I
like the Vegabond approach.

John

On Tuesday, May 7, 2013 at 1:10 AM, Bryan Berry wrote:

I have some ideas on extending test-kitchen to test clustered
applications and I would love some feedback before I go coding off in
a particular direction.

Problem: I deal primarily with distributed applications and testing
the related cookbooks can be a pain. I also have to make sure these
cookbooks work across different linux distros. Test-Kitchen was not
originally created with this use case in mind though at this time I
don't see any reason it couldn't support this use case.

Vagabond[1], written by Chris Roberts, extends .kitchen.yml to include
a clusters component, among other things. I would love to see
test-kitchen absorb some of that functionality or at least provide
extension points to make this more easily pluggable.

Testing a cluster works differently than testing an individual node. I
want to interrogate the state of the cluster as a whole, not look
inside each individual server. To do this I need to wait until all
servers in a cluster converge or at least a quorum of them do. Once a
quorum of nodes has converged, run a series of tests against the the
cluster. These tests execute on the client executing kitchen rather
than inside the nodes of the cluster.

Here are the steps in brief:

  1. Converge all nodes in a cluster
  2. Wait for quorum of nodes to converge
  3. execute tests against the cluster

I would put the tests for a cluster in my_cluster/test/cluster/cluster_name

Applications like Elasticsearch, Zookeeper, or Cassandra don't have a
master node, so each node has an identical run_list and attribute set

clusters:
default:

  • member: zk1
  • member: zk2
  • member: zk3

platforms:

  • name: ubuntu-12.04
  • name: centos-6.3

suites:

  • name: default
    run_list: [ "recipe[zookeeper]" ]

To test the default cluster on CentOS, kitchen test --cluster default --platform centos-6.3

Let's make this even DRYer

clusters:
default:
node_count: 3
quorum: 2

platforms:

  • name: ubuntu-12.04
  • name: centos-6.3

suites:

  • name: default
    run_list: [ "recipe[zookeeper]" ]

The test for this example zookeeper cluster would connect to one
zookeeper node, make sure it sees the other zookeeper nodes. I am
paraphrasing the zookeeper code because I am not certain of what the
actual call would be.

my_cluster/test/cluster/default/check_members.rb

require 'zk'

describe "zookeeper cluster" do
before(:all) do
@zk = ZK.new(some_ip)
end

it "sees the other members" do
peers = @zk.get("system/peers")
peers.should == ACTUAL_PEERS
end

end

The primary challenge here is resolving the names of the members of
the cluster. One way to do this is to access the statefiles for the
nodes in .kitchen/*.yml. Another would be to somehow access the
@instances array for the current Kitchen config. Yet another option
would be to stand up a chef-zero server.

What about a distributed app where each node does not have an
identical run_list? Here is how I would handle that for something like
HBase that has a "hmaster" that stores metadata for the whole cluster.

clusters:
default:

  • member: head1
    run_list: [ "recipe[hbase::hmaster]" ]
  • member: store1
    run_list: [ "recipe[hbase::data_node]" ]
  • member: store2
    run_list: [ "recipe[hbase::data_node]" ]

platforms:

  • name: ubuntu-12.04
  • name: centos-6.3

suites:

  • name: default
    run_list: [ "recipe[zookeeper]" ]

I know that test-kitchen is an unopinionated tool and attempts to be
workflow agnostic but I feel that what I have shown here is a fairly
simple workflow. The cluster definition I have presented is not
suitable for modeling failover in a cluster or specific states. At
that point one should use a custom workflow tool like chef-workflow or
simply custom rake tasks.

Depending on the feedback I get from this I plan to extend
test-kitchen or Vagabond to handle the workflow I have described here.
Thanks for reading!

  1. GitHub - chrisroberts/vagabond at develop

Have y'all looked into Erik Hollensbe's chef-workflow? I haven't used it yet myself but it shows a lot of promise for this kind of use case.

https://github.com/chef-workflow

--
Joshua Timberman

On Tuesday, May 7, 2013 at 2:31, John Dewey wrote:

Would also be nice if the runner doesn't use chef-solo. I like where Vegabond is
going. The ability to truly do integration testing by spinning up dependent systems/services,
and test the cookbook in isolation is needed.

I have abandoned test kitchen all together in favor of chefspec, since I can stub/mock
dependencies that chef-solo cannot resolve. Looking forward to you work, so we can
add test kitchen to the openstack cookbooks, and get those much needed integration
tests.

For example, I have a use-cases were a cookbook depends on rabbit, mysql, memcached to be
running, and those roles registered into the chef server. At that point the prerequisites
are met and the cookbook can install recipes, and perform assertions. This is where I
like the Vegabond approach.

John

On Tuesday, May 7, 2013 at 1:10 AM, Bryan Berry wrote:

I have some ideas on extending test-kitchen to test clustered
applications and I would love some feedback before I go coding off in
a particular direction.

Problem: I deal primarily with distributed applications and testing
the related cookbooks can be a pain. I also have to make sure these
cookbooks work across different linux distros. Test-Kitchen was not
originally created with this use case in mind though at this time I
don't see any reason it couldn't support this use case.

Vagabond[1], written by Chris Roberts, extends .kitchen.yml to include
a clusters component, among other things. I would love to see
test-kitchen absorb some of that functionality or at least provide
extension points to make this more easily pluggable.

Testing a cluster works differently than testing an individual node. I
want to interrogate the state of the cluster as a whole, not look
inside each individual server. To do this I need to wait until all
servers in a cluster converge or at least a quorum of them do. Once a
quorum of nodes has converged, run a series of tests against the the
cluster. These tests execute on the client executing kitchen rather
than inside the nodes of the cluster.

Here are the steps in brief:

  1. Converge all nodes in a cluster
  2. Wait for quorum of nodes to converge
  3. execute tests against the cluster

I would put the tests for a cluster in my_cluster/test/cluster/cluster_name

Applications like Elasticsearch, Zookeeper, or Cassandra don't have a
master node, so each node has an identical run_list and attribute set

clusters:
default:

  • member: zk1
  • member: zk2
  • member: zk3

platforms:

  • name: ubuntu-12.04
  • name: centos-6.3

suites:

  • name: default
    run_list: [ "recipe[zookeeper]" ]

To test the default cluster on CentOS, kitchen test --cluster default --platform centos-6.3

Let's make this even DRYer

clusters:
default:
node_count: 3
quorum: 2

platforms:

  • name: ubuntu-12.04
  • name: centos-6.3

suites:

  • name: default
    run_list: [ "recipe[zookeeper]" ]

The test for this example zookeeper cluster would connect to one
zookeeper node, make sure it sees the other zookeeper nodes. I am
paraphrasing the zookeeper code because I am not certain of what the
actual call would be.

my_cluster/test/cluster/default/check_members.rb

require 'zk'

describe "zookeeper cluster" do
before(:all) do
@zk = ZK.new(some_ip)
end

it "sees the other members" do
peers = @zk.get("system/peers")
peers.should == ACTUAL_PEERS
end

end

The primary challenge here is resolving the names of the members of
the cluster. One way to do this is to access the statefiles for the
nodes in .kitchen/*.yml. Another would be to somehow access the
@instances array for the current Kitchen config. Yet another option
would be to stand up a chef-zero server.

What about a distributed app where each node does not have an
identical run_list? Here is how I would handle that for something like
HBase that has a "hmaster" that stores metadata for the whole cluster.

clusters:
default:

  • member: head1
    run_list: [ "recipe[hbase::hmaster]" ]
  • member: store1
    run_list: [ "recipe[hbase::data_node]" ]
  • member: store2
    run_list: [ "recipe[hbase::data_node]" ]

platforms:

  • name: ubuntu-12.04
  • name: centos-6.3

suites:

  • name: default
    run_list: [ "recipe[zookeeper]" ]

I know that test-kitchen is an unopinionated tool and attempts to be
workflow agnostic but I feel that what I have shown here is a fairly
simple workflow. The cluster definition I have presented is not
suitable for modeling failover in a cluster or specific states. At
that point one should use a custom workflow tool like chef-workflow or
simply custom rake tasks.

Depending on the feedback I get from this I plan to extend
test-kitchen or Vagabond to handle the workflow I have described here.
Thanks for reading!

  1. GitHub - chrisroberts/vagabond at develop

I had this discussion wuth @fnichol during the chef meet up. Ideally we
would like to see chef client support first followed by the cluster wide
testing. To me this again creates a orchestration problem, and we might
reuse the same component yhat we'll be using for actual provisioning.
On May 7, 2013 1:31 AM, "John Dewey" john@dewey.ws wrote:

Would also be nice if the runner doesn't use chef-solo. I like where
Vegabond is
going. The ability to truly do integration testing by spinning up
dependent systems/services,
and test the cookbook in isolation is needed.

I have abandoned test kitchen all together in favor of chefspec, since I
can stub/mock
dependencies that chef-solo cannot resolve. Looking forward to you work,
so we can
add test kitchen to the openstack cookbooks, and get those much needed
integration
tests.

For example, I have a use-cases were a cookbook depends on rabbit, mysql,
memcached to be
running, and those roles registered into the chef server. At that point
the prerequisites
are met and the cookbook can install recipes, and perform assertions.
This is where I
like the Vegabond approach.

John

On Tuesday, May 7, 2013 at 1:10 AM, Bryan Berry wrote:

I have some ideas on extending test-kitchen to test clustered
applications and I would love some feedback before I go coding off in
a particular direction.

Problem: I deal primarily with distributed applications and testing
the related cookbooks can be a pain. I also have to make sure these
cookbooks work across different linux distros. Test-Kitchen was not
originally created with this use case in mind though at this time I
don't see any reason it couldn't support this use case.

Vagabond[1], written by Chris Roberts, extends .kitchen.yml to include
a clusters component, among other things. I would love to see
test-kitchen absorb some of that functionality or at least provide
extension points to make this more easily pluggable.

Testing a cluster works differently than testing an individual node. I
want to interrogate the state of the cluster as a whole, not look
inside each individual server. To do this I need to wait until all
servers in a cluster converge or at least a quorum of them do. Once a
quorum of nodes has converged, run a series of tests against the the
cluster. These tests execute on the client executing kitchen rather
than inside the nodes of the cluster.

Here are the steps in brief:

  1. Converge all nodes in a cluster
  2. Wait for quorum of nodes to converge
  3. execute tests against the cluster

I would put the tests for a cluster in my_cluster/test/cluster/cluster_name

Applications like Elasticsearch, Zookeeper, or Cassandra don't have a
master node, so each node has an identical run_list and attribute set

clusters:
default:

  • member: zk1
  • member: zk2
  • member: zk3

platforms:

  • name: ubuntu-12.04
  • name: centos-6.3

suites:

  • name: default
    run_list: [ "recipe[zookeeper]" ]

To test the default cluster on CentOS, kitchen test --cluster default --platform centos-6.3

Let's make this even DRYer

clusters:
default:
node_count: 3
quorum: 2

platforms:

  • name: ubuntu-12.04
  • name: centos-6.3

suites:

  • name: default
    run_list: [ "recipe[zookeeper]" ]

The test for this example zookeeper cluster would connect to one
zookeeper node, make sure it sees the other zookeeper nodes. I am
paraphrasing the zookeeper code because I am not certain of what the
actual call would be.

my_cluster/test/cluster/default/check_members.rb

require 'zk'

describe "zookeeper cluster" do
before(:all) do
@zk = ZK.new(some_ip)
end

it "sees the other members" do
peers = @zk.get("system/peers")
peers.should == ACTUAL_PEERS
end

end

The primary challenge here is resolving the names of the members of
the cluster. One way to do this is to access the statefiles for the
nodes in .kitchen/*.yml. Another would be to somehow access the
@instances array for the current Kitchen config. Yet another option
would be to stand up a chef-zero server.

What about a distributed app where each node does not have an
identical run_list? Here is how I would handle that for something like
HBase that has a "hmaster" that stores metadata for the whole cluster.

clusters:
default:

  • member: head1
    run_list: [ "recipe[hbase::hmaster]" ]
  • member: store1
    run_list: [ "recipe[hbase::data_node]" ]
  • member: store2
    run_list: [ "recipe[hbase::data_node]" ]

platforms:

  • name: ubuntu-12.04
  • name: centos-6.3

suites:

  • name: default
    run_list: [ "recipe[zookeeper]" ]

I know that test-kitchen is an unopinionated tool and attempts to be
workflow agnostic but I feel that what I have shown here is a fairly
simple workflow. The cluster definition I have presented is not
suitable for modeling failover in a cluster or specific states. At
that point one should use a custom workflow tool like chef-workflow or
simply custom rake tasks.

Depending on the feedback I get from this I plan to extend
test-kitchen or Vagabond to handle the workflow I have described here.
Thanks for reading!

  1. GitHub - chrisroberts/vagabond at develop

josh,

I think chef-workflow is neat but it isn't able to reuse
test-kitchen's yaml format nor its drivers. I also think that workflow
I have described isn't overly complicated to need custom rake tasks.

On Wed, May 8, 2013 at 7:35 AM, Joshua Timberman joshua@opscode.com wrote:

Have y'all looked into Erik Hollensbe's chef-workflow? I haven't used it yet
myself but it shows a lot of promise for this kind of use case.

https://github.com/chef-workflow

--
Joshua Timberman

On Tuesday, May 7, 2013 at 2:31, John Dewey wrote:

Would also be nice if the runner doesn't use chef-solo. I like where
Vegabond is
going. The ability to truly do integration testing by spinning up dependent
systems/services,
and test the cookbook in isolation is needed.

I have abandoned test kitchen all together in favor of chefspec, since I can
stub/mock
dependencies that chef-solo cannot resolve. Looking forward to you work, so
we can
add test kitchen to the openstack cookbooks, and get those much needed
integration
tests.

For example, I have a use-cases were a cookbook depends on rabbit, mysql,
memcached to be
running, and those roles registered into the chef server. At that point the
prerequisites
are met and the cookbook can install recipes, and perform assertions. This
is where I
like the Vegabond approach.

John

On Tuesday, May 7, 2013 at 1:10 AM, Bryan Berry wrote:

I have some ideas on extending test-kitchen to test clustered
applications and I would love some feedback before I go coding off in
a particular direction.

Problem: I deal primarily with distributed applications and testing
the related cookbooks can be a pain. I also have to make sure these
cookbooks work across different linux distros. Test-Kitchen was not
originally created with this use case in mind though at this time I
don't see any reason it couldn't support this use case.

Vagabond[1], written by Chris Roberts, extends .kitchen.yml to include
a clusters component, among other things. I would love to see
test-kitchen absorb some of that functionality or at least provide
extension points to make this more easily pluggable.

Testing a cluster works differently than testing an individual node. I
want to interrogate the state of the cluster as a whole, not look
inside each individual server. To do this I need to wait until all
servers in a cluster converge or at least a quorum of them do. Once a
quorum of nodes has converged, run a series of tests against the the
cluster. These tests execute on the client executing kitchen rather
than inside the nodes of the cluster.

Here are the steps in brief:

  1. Converge all nodes in a cluster
  2. Wait for quorum of nodes to converge
  3. execute tests against the cluster

I would put the tests for a cluster in my_cluster/test/cluster/cluster_name

Applications like Elasticsearch, Zookeeper, or Cassandra don't have a
master node, so each node has an identical run_list and attribute set

clusters:
default:

  • member: zk1
  • member: zk2
  • member: zk3

platforms:

  • name: ubuntu-12.04
  • name: centos-6.3

suites:

  • name: default
    run_list: [ "recipe[zookeeper]" ]

To test the default cluster on CentOS, kitchen test --cluster default --platform centos-6.3

Let's make this even DRYer

clusters:
default:
node_count: 3
quorum: 2

platforms:

  • name: ubuntu-12.04
  • name: centos-6.3

suites:

  • name: default
    run_list: [ "recipe[zookeeper]" ]

The test for this example zookeeper cluster would connect to one
zookeeper node, make sure it sees the other zookeeper nodes. I am
paraphrasing the zookeeper code because I am not certain of what the
actual call would be.

my_cluster/test/cluster/default/check_members.rb

require 'zk'

describe "zookeeper cluster" do
before(:all) do
@zk = ZK.new(some_ip)
end

it "sees the other members" do
peers = @zk.get("system/peers")
peers.should == ACTUAL_PEERS
end

end

The primary challenge here is resolving the names of the members of
the cluster. One way to do this is to access the statefiles for the
nodes in .kitchen/*.yml. Another would be to somehow access the
@instances array for the current Kitchen config. Yet another option
would be to stand up a chef-zero server.

What about a distributed app where each node does not have an
identical run_list? Here is how I would handle that for something like
HBase that has a "hmaster" that stores metadata for the whole cluster.

clusters:
default:

  • member: head1
    run_list: [ "recipe[hbase::hmaster]" ]
  • member: store1
    run_list: [ "recipe[hbase::data_node]" ]
  • member: store2
    run_list: [ "recipe[hbase::data_node]" ]

platforms:

  • name: ubuntu-12.04
  • name: centos-6.3

suites:

  • name: default
    run_list: [ "recipe[zookeeper]" ]

I know that test-kitchen is an unopinionated tool and attempts to be
workflow agnostic but I feel that what I have shown here is a fairly
simple workflow. The cluster definition I have presented is not
suitable for modeling failover in a cluster or specific states. At
that point one should use a custom workflow tool like chef-workflow or
simply custom rake tasks.

Depending on the feedback I get from this I plan to extend
test-kitchen or Vagabond to handle the workflow I have described here.
Thanks for reading!

  1. GitHub - chrisroberts/vagabond at develop