How do I know if my application has really been "provisioned"? a suggestion


#1

I have been struggling lately w/ the problem of how to know whether or
not my servers have been fully provisioned. This is a problem that
affects both cookbook testing and orchestration.

I have numerous java applications that have long startup times. I
believe Rails 3 apps also have long startup times. For example,
elasticsearch takes > 5s to start up fully. However Chef does not
natively have a way to determine when elasticsearch has fully started
up. All Chef can know is the exit code returned by service elasticsearch start and its various platform equivalents. Why is this
an issue? Well, I don’t want to run my minitest-handler integration
tests until elasticsearch is actually functional. Also, if I am
testing a multiple VM setup where another VM depends on elasticsearch,
I need to know when elasticsearch has completed startup.

Erik Hollensbe is doing some freaking awesome work on workflow
orchestration w/ chef-workflow and I think it illustrates the problem
here

require 'chef-workflow/helper’
class MyTest < MiniTest::Unit::VagrantTestCase
def before
@json_msg = '{ ‘id’: “dumb message json msg”}'
end
def setup
provision(‘elasticsearch’)
provision(‘logstash’)
wait_for(‘elasticsearch’)
wait_for(‘logstash’)
inject_logstash_message(@json_msg)
end

def test_message_indexed_elasticsearch
assert es_has_message?(@json_msg)
end
end

If I understand Erik’s code correctly, the wait_for('elasticsearch')
only waits for the vagrant provisioner to return. The vagrant
provisioner in turn only waits for service elasticsearch start to
return a non-zero exit-code.

We need an optional way to determine whether an server has been
complete provisioned, or that all the resources have entered a "done"
state. The only way I know that elasticsearch has started
successfully is if I see in the log “Elasticsearch has started” w/ a
timestamp more recent than when I started the service.

Here is my idea, pls tell me if it is crazy.

I think that the service resource should have an additional attribute
"done" that takes two ruby blocks, named before and after. The first
block would return a hash with the results of the block.

Warning much of the Ruby code here maybe horribly incorrect.

def get_current_offset(file_name)
{ :offset => ::File.seek(file_name, EOF).get_offset }
end

def es_started?(params)
!::File.open(params[:log_file], params[:offset],
‘r’).grep(params[:started_text]).empty?
end

log_file = '/usr/local/var/log/elasticsearch.log’
started_test = "elasticsearch has started successfully"
timeout = 60

service “elasticsearch” do
done(
:before => { get_current_offset(logfile) },
:after => { es_started?(:log_file => log_file, :offset
=> @before_results[:offset]
:timeout => 60
)
end

The before block would run before the service is actually actioned.
Now Chef would need some additional machinery to collect all the done
:after blocks and the related @before_results. This could be done by
chef_handler but may be better as part of chef itself. Let’s call it
the done_handler for now. This done_handler would mark the time before
it starts handling any done_after blocks, then loop through the
collected done_after blocks for the specified timeout. Once all blocks
are complete it would continue onto other handlers, such as the
minitest_handler.

I think this done_handler could be part of something called “wait_mode”


#2

I had to deal with a similar problem in some capistrano recipes for deploying JBoss applications.

The approach I used was to monitor the “deployments” directory in JBoss and check for an “artifact.deployed” file with a timestamp greater than that of “artifact.ear”. It’s been working beautifully, but your solution of checking the tiestamp in the logs might be more robust.


Cassiano Leal

On Sunday, December 9, 2012 at 10:22, Bryan Berry wrote:

I have been struggling lately w/ the problem of how to know whether or
not my servers have been fully provisioned. This is a problem that
affects both cookbook testing and orchestration.

I have numerous java applications that have long startup times. I
believe Rails 3 apps also have long startup times. For example,
elasticsearch takes > 5s to start up fully. However Chef does not
natively have a way to determine when elasticsearch has fully started
up. All Chef can know is the exit code returned by service elasticsearch start and its various platform equivalents. Why is this
an issue? Well, I don’t want to run my minitest-handler integration
tests until elasticsearch is actually functional. Also, if I am
testing a multiple VM setup where another VM depends on elasticsearch,
I need to know when elasticsearch has completed startup.

Erik Hollensbe is doing some freaking awesome work on workflow
orchestration w/ chef-workflow and I think it illustrates the problem
here

require 'chef-workflow/helper’
class MyTest < MiniTest::Unit::VagrantTestCase
def before
@json_msg = '{ ‘id’: “dumb message json msg”}'
end
def setup
provision(‘elasticsearch’)
provision(‘logstash’)
wait_for(‘elasticsearch’)
wait_for(‘logstash’)
inject_logstash_message(@json_msg)
end

def test_message_indexed_elasticsearch
assert es_has_message?(@json_msg)
end
end

If I understand Erik’s code correctly, the wait_for('elasticsearch')
only waits for the vagrant provisioner to return. The vagrant
provisioner in turn only waits for service elasticsearch start to
return a non-zero exit-code.

We need an optional way to determine whether an server has been
complete provisioned, or that all the resources have entered a "done"
state. The only way I know that elasticsearch has started
successfully is if I see in the log “Elasticsearch has started” w/ a
timestamp more recent than when I started the service.

Here is my idea, pls tell me if it is crazy.

I think that the service resource should have an additional attribute
"done" that takes two ruby blocks, named before and after. The first
block would return a hash with the results of the block.

Warning much of the Ruby code here maybe horribly incorrect.

def get_current_offset(file_name)
{ :offset => ::File.seek(file_name, EOF).get_offset }
end

def es_started?(params)
!::File.open(params[:log_file], params[:offset],
‘r’).grep(params[:started_text]).empty?
end

log_file = '/usr/local/var/log/elasticsearch.log’
started_test = "elasticsearch has started successfully"
timeout = 60

service “elasticsearch” do
done(
:before => { get_current_offset(logfile) },
:after => { es_started?(:log_file => log_file, :offset
=> @before_results[:offset]
:timeout => 60
)
end

The before block would run before the service is actually actioned.
Now Chef would need some additional machinery to collect all the done
:after blocks and the related @before_results. This could be done by
chef_handler but may be better as part of chef itself. Let’s call it
the done_handler for now. This done_handler would mark the time before
it starts handling any done_after blocks, then loop through the
collected done_after blocks for the specified timeout. Once all blocks
are complete it would continue onto other handlers, such as the
minitest_handler.

I think this done_handler could be part of something called “wait_mode”


#3

Hey Cassiano,

I think the mechanism used to verify whether the application has been
fully provisioned will vary by application. For that reason I think
the using an optional :before block to gather info and an :after block
with arbitrary ruby code to check the state of the system will give us
the flexibility we need.

On Sun, Dec 9, 2012 at 4:09 PM, Cassiano Leal cassianoleal@gmail.com wrote:

I had to deal with a similar problem in some capistrano recipes for
deploying JBoss applications.

The approach I used was to monitor the “deployments” directory in JBoss and
check for an “artifact.deployed” file with a timestamp greater than that of
"artifact.ear". It’s been working beautifully, but your solution of checking
the tiestamp in the logs might be more robust.


Cassiano Leal

On Sunday, December 9, 2012 at 10:22, Bryan Berry wrote:

I have been struggling lately w/ the problem of how to know whether or
not my servers have been fully provisioned. This is a problem that
affects both cookbook testing and orchestration.

I have numerous java applications that have long startup times. I
believe Rails 3 apps also have long startup times. For example,
elasticsearch takes > 5s to start up fully. However Chef does not
natively have a way to determine when elasticsearch has fully started
up. All Chef can know is the exit code returned by service elasticsearch start and its various platform equivalents. Why is this
an issue? Well, I don’t want to run my minitest-handler integration
tests until elasticsearch is actually functional. Also, if I am
testing a multiple VM setup where another VM depends on elasticsearch,
I need to know when elasticsearch has completed startup.

Erik Hollensbe is doing some freaking awesome work on workflow
orchestration w/ chef-workflow and I think it illustrates the problem
here

require 'chef-workflow/helper’
class MyTest < MiniTest::Unit::VagrantTestCase
def before
@json_msg = '{ ‘id’: “dumb message json msg”}'
end
def setup
provision(‘elasticsearch’)
provision(‘logstash’)
wait_for(‘elasticsearch’)
wait_for(‘logstash’)
inject_logstash_message(@json_msg)
end

def test_message_indexed_elasticsearch
assert es_has_message?(@json_msg)
end
end

If I understand Erik’s code correctly, the wait_for('elasticsearch')
only waits for the vagrant provisioner to return. The vagrant
provisioner in turn only waits for service elasticsearch start to
return a non-zero exit-code.

We need an optional way to determine whether an server has been
complete provisioned, or that all the resources have entered a "done"
state. The only way I know that elasticsearch has started
successfully is if I see in the log “Elasticsearch has started” w/ a
timestamp more recent than when I started the service.

Here is my idea, pls tell me if it is crazy.

I think that the service resource should have an additional attribute
"done" that takes two ruby blocks, named before and after. The first
block would return a hash with the results of the block.

Warning much of the Ruby code here maybe horribly incorrect.

def get_current_offset(file_name)
{ :offset => ::File.seek(file_name, EOF).get_offset }
end

def es_started?(params)
!::File.open(params[:log_file], params[:offset],
‘r’).grep(params[:started_text]).empty?
end

log_file = '/usr/local/var/log/elasticsearch.log’
started_test = "elasticsearch has started successfully"
timeout = 60

service “elasticsearch” do
done(
:before => { get_current_offset(logfile) },
:after => { es_started?(:log_file => log_file, :offset
=> @before_results[:offset]
:timeout => 60
)
end

The before block would run before the service is actually actioned.
Now Chef would need some additional machinery to collect all the done
:after blocks and the related @before_results. This could be done by
chef_handler but may be better as part of chef itself. Let’s call it
the done_handler for now. This done_handler would mark the time before
it starts handling any done_after blocks, then loop through the
collected done_after blocks for the specified timeout. Once all blocks
are complete it would continue onto other handlers, such as the
minitest_handler.

I think this done_handler could be part of something called “wait_mode”


#4

Yes, I agree with that, and I find your example code readable.

I’d like to see some default implementations for the :before and :after blocks as well, maybe referenced through some code along these lines:

service “elasticsearch” do
done(
:type => :jboss_log
)
end

Then one could probably override some attributes like so:

service “elasticsearch” do
done(
:type => :jboss_log,
:timeout => 120
)
end

Hope that makes sense! :slight_smile:


Cassiano Leal

On Sunday, December 9, 2012 at 13:23, Bryan Berry wrote:

Hey Cassiano,

I think the mechanism used to verify whether the application has been
fully provisioned will vary by application. For that reason I think
the using an optional :before block to gather info and an :after block
with arbitrary ruby code to check the state of the system will give us
the flexibility we need.

On Sun, Dec 9, 2012 at 4:09 PM, Cassiano Leal <cassianoleal@gmail.com (mailto:cassianoleal@gmail.com)> wrote:

I had to deal with a similar problem in some capistrano recipes for
deploying JBoss applications.

The approach I used was to monitor the “deployments” directory in JBoss and
check for an “artifact.deployed” file with a timestamp greater than that of
"artifact.ear". It’s been working beautifully, but your solution of checking
the tiestamp in the logs might be more robust.


Cassiano Leal

On Sunday, December 9, 2012 at 10:22, Bryan Berry wrote:

I have been struggling lately w/ the problem of how to know whether or
not my servers have been fully provisioned. This is a problem that
affects both cookbook testing and orchestration.

I have numerous java applications that have long startup times. I
believe Rails 3 apps also have long startup times. For example,
elasticsearch takes > 5s to start up fully. However Chef does not
natively have a way to determine when elasticsearch has fully started
up. All Chef can know is the exit code returned by service elasticsearch start and its various platform equivalents. Why is this
an issue? Well, I don’t want to run my minitest-handler integration
tests until elasticsearch is actually functional. Also, if I am
testing a multiple VM setup where another VM depends on elasticsearch,
I need to know when elasticsearch has completed startup.

Erik Hollensbe is doing some freaking awesome work on workflow
orchestration w/ chef-workflow and I think it illustrates the problem
here

require 'chef-workflow/helper’
class MyTest < MiniTest::Unit::VagrantTestCase
def before
@json_msg = '{ ‘id’: “dumb message json msg”}'
end
def setup
provision(‘elasticsearch’)
provision(‘logstash’)
wait_for(‘elasticsearch’)
wait_for(‘logstash’)
inject_logstash_message(@json_msg)
end

def test_message_indexed_elasticsearch
assert es_has_message?(@json_msg)
end
end

If I understand Erik’s code correctly, the wait_for('elasticsearch')
only waits for the vagrant provisioner to return. The vagrant
provisioner in turn only waits for service elasticsearch start to
return a non-zero exit-code.

We need an optional way to determine whether an server has been
complete provisioned, or that all the resources have entered a "done"
state. The only way I know that elasticsearch has started
successfully is if I see in the log “Elasticsearch has started” w/ a
timestamp more recent than when I started the service.

Here is my idea, pls tell me if it is crazy.

I think that the service resource should have an additional attribute
"done" that takes two ruby blocks, named before and after. The first
block would return a hash with the results of the block.

Warning much of the Ruby code here maybe horribly incorrect.

def get_current_offset(file_name)
{ :offset => ::File.seek(file_name, EOF).get_offset }
end

def es_started?(params)
!::File.open(params[:log_file], params[:offset],
‘r’).grep(params[:started_text]).empty?
end

log_file = '/usr/local/var/log/elasticsearch.log’
started_test = "elasticsearch has started successfully"
timeout = 60

service “elasticsearch” do
done(
:before => { get_current_offset(logfile) },
:after => { es_started?(:log_file => log_file, :offset
=> @before_results[:offset]
:timeout => 60
)
end

The before block would run before the service is actually actioned.
Now Chef would need some additional machinery to collect all the done
:after blocks and the related @before_results. This could be done by
chef_handler but may be better as part of chef itself. Let’s call it
the done_handler for now. This done_handler would mark the time before
it starts handling any done_after blocks, then loop through the
collected done_after blocks for the specified timeout. Once all blocks
are complete it would continue onto other handlers, such as the
minitest_handler.

I think this done_handler could be part of something called “wait_mode”


#5

You could use a ruby block which tests a url of the app for a response code:

e.g. start god monitor after successful request

ruby_block “god-enable” do
block do
timeout = 600
retries = 20
host = "localhost:8080"
Chef::Log.info "call get on #{host}, maximal request time: #{timeout} seconds"
c = Curl::Easy.new(“http://#{host}#{app[‘monitoring_path’]}”) do |curl|
#curl.verbose = true
curl.timeout = timeout
end

while retries > 0 do
  retries -=1
  c.perform
  running = false
  if [200,210].include?(c.response_code)
    running = true
    Chef::Log.info "service running, starting god watch"
    break
  else
    Chef::Log.info "service not running (#{c.response_code}), retry in 3 seconds"
    sleep 3
  end
  if retries == 0 && running == false
    raise RuntimeError, "service did not come up" , caller
  end
end

end
action :nothing
notifies :restart, "service[god]"
end

a better approach would be to use something like https://github.com/lusis/Noah or zookeeper

On 09.12.2012, at 16:23, Bryan Berry bryan.berry@gmail.com wrote:

Hey Cassiano,

I think the mechanism used to verify whether the application has been
fully provisioned will vary by application. For that reason I think
the using an optional :before block to gather info and an :after block
with arbitrary ruby code to check the state of the system will give us
the flexibility we need.

On Sun, Dec 9, 2012 at 4:09 PM, Cassiano Leal cassianoleal@gmail.com wrote:

I had to deal with a similar problem in some capistrano recipes for
deploying JBoss applications.

The approach I used was to monitor the “deployments” directory in JBoss and
check for an “artifact.deployed” file with a timestamp greater than that of
"artifact.ear". It’s been working beautifully, but your solution of checking
the tiestamp in the logs might be more robust.


Cassiano Leal

On Sunday, December 9, 2012 at 10:22, Bryan Berry wrote:

I have been struggling lately w/ the problem of how to know whether or
not my servers have been fully provisioned. This is a problem that
affects both cookbook testing and orchestration.

I have numerous java applications that have long startup times. I
believe Rails 3 apps also have long startup times. For example,
elasticsearch takes > 5s to start up fully. However Chef does not
natively have a way to determine when elasticsearch has fully started
up. All Chef can know is the exit code returned by service elasticsearch start and its various platform equivalents. Why is this
an issue? Well, I don’t want to run my minitest-handler integration
tests until elasticsearch is actually functional. Also, if I am
testing a multiple VM setup where another VM depends on elasticsearch,
I need to know when elasticsearch has completed startup.

Erik Hollensbe is doing some freaking awesome work on workflow
orchestration w/ chef-workflow and I think it illustrates the problem
here

require 'chef-workflow/helper’
class MyTest < MiniTest::Unit::VagrantTestCase
def before
@json_msg = '{ ‘id’: “dumb message json msg”}'
end
def setup
provision(‘elasticsearch’)
provision(‘logstash’)
wait_for(‘elasticsearch’)
wait_for(‘logstash’)
inject_logstash_message(@json_msg)
end

def test_message_indexed_elasticsearch
assert es_has_message?(@json_msg)
end
end

If I understand Erik’s code correctly, the wait_for('elasticsearch')
only waits for the vagrant provisioner to return. The vagrant
provisioner in turn only waits for service elasticsearch start to
return a non-zero exit-code.

We need an optional way to determine whether an server has been
complete provisioned, or that all the resources have entered a "done"
state. The only way I know that elasticsearch has started
successfully is if I see in the log “Elasticsearch has started” w/ a
timestamp more recent than when I started the service.

Here is my idea, pls tell me if it is crazy.

I think that the service resource should have an additional attribute
"done" that takes two ruby blocks, named before and after. The first
block would return a hash with the results of the block.

Warning much of the Ruby code here maybe horribly incorrect.

def get_current_offset(file_name)
{ :offset => ::File.seek(file_name, EOF).get_offset }
end

def es_started?(params)
!::File.open(params[:log_file], params[:offset],
‘r’).grep(params[:started_text]).empty?
end

log_file = '/usr/local/var/log/elasticsearch.log’
started_test = "elasticsearch has started successfully"
timeout = 60

service “elasticsearch” do
done(
:before => { get_current_offset(logfile) },
:after => { es_started?(:log_file => log_file, :offset
=> @before_results[:offset]
:timeout => 60
)
end

The before block would run before the service is actually actioned.
Now Chef would need some additional machinery to collect all the done
:after blocks and the related @before_results. This could be done by
chef_handler but may be better as part of chef itself. Let’s call it
the done_handler for now. This done_handler would mark the time before
it starts handling any done_after blocks, then loop through the
collected done_after blocks for the specified timeout. Once all blocks
are complete it would continue onto other handlers, such as the
minitest_handler.

I think this done_handler could be part of something called “wait_mode”


DI Edmund Haselwanter, edmund@haselwanter.com, http://edmund.haselwanter.com/
http://www.iteh.at | http://facebook.com/iTeh.solutions | http://at.linkedin.com/in/haselwanteredmund