Convergence and Execution Problems


#1

Hi Guys,
I have a question regarding some Convergence and Execution Problems that I am seeing in a recipe I am testing.
I have created a library cookbook to handle some aspects of our application (this is running against only Windows 2008R2 Servers).
The issue is, that after calling a LWRP from the library cookbook that stops particular Windows Services associated with our app, the application cookbook then tries to copy file used to configure the services. This always seems to fail even though I’m capturing the exception and retrying 3 to 20 times. A similar copy for some website configuration has no such problem and run before the service file copy.
I’ve tried different things to get this to work and from what I’m seeing I suspect it’s something to do with the way Chef is executing the steps in the recipe.
I think the LWRP for the services fires and starts the subprocess to stop all the services (written in a ruby class), but once this starts it then carries on to the next step straight away without waiting for the LWRP subprocess to complete, which is why I think the copy of the service files fails (as the services are still shutting down and have the file locked) but the application files are free to be copied over.
Another reason I think this is because at the end of the application recipe, the same LWRP is called to start the services again, and in the logs I can see it firing off the process then the Chef run finishes and returns successfully, but looking at the Windows servers the services are still starting.
I will admit to not completely understanding the execution order (beyond it firing them off in the order they are written, top to bottom), but I think I am missing something fundamental about how to structure this.

Any pointers would be gratefully received, I find I struggle more because of the Windows side of things being less well documented that the Linux side (which is understandable as more people are running Chef for managing Linux servers).

The recipe section is below, the drop is the part that performs a backup of existing files then copies new ones. Should the Ruby class I use to stop services be returning something to make sure the recipe waits?

#stop services on node
our_services “stop” do
action :stop
end

iniate dlldrop

our_dlldrop fname do
action :undo
end

#start Services post backup
our_services “start” do
action :start
end

Thanks


#2

On Thursday, November 6, 2014 at 4:32 AM, ChristopherHall@air-watch.com wrote:

Hi Guys,
I have a question regarding some Convergence and Execution Problems that I am seeing in a recipe I am testing.
I have created a library cookbook to handle some aspects of our application (this is running against only Windows 2008R2 Servers).
The issue is, that after calling a LWRP from the library cookbook that stops particular Windows Services associated with our app, the application cookbook then tries to copy file used to configure the services. This always seems to fail even though I’m capturing the exception and retrying 3 to 20 times. A similar copy for some website configuration has no such problem and run before the service file copy.
I’ve tried different things to get this to work and from what I’m seeing I suspect it’s something to do with the way Chef is executing the steps in the recipe.
I think the LWRP for the services fires and starts the subprocess to stop all the services (written in a ruby class), but once this starts it then carries on to the next step straight away without waiting for the LWRP subprocess to complete, which is why I think the copy of the service files fails (as the services are still shutting down and have the file locked) but the application files are free to be copied over.

From what you’ve described, it sounds like this part of the process needs to block Chef from continuing until it’s finished.

Another reason I think this is because at the end of the application recipe, the same LWRP is called to start the services again, and in the logs I can see it firing off the process then the Chef run finishes and returns successfully, but looking at the Windows servers the services are still starting.
I will admit to not completely understanding the execution order (beyond it firing them off in the order they are written, top to bottom), but I think I am missing something fundamental about how to structure this.

Any pointers would be gratefully received, I find I struggle more because of the Windows side of things being less well documented that the Linux side (which is understandable as more people are running Chef for managing Linux servers).

The recipe section is below, the drop is the part that performs a backup of existing files then copies new ones. Should the Ruby class I use to stop services be returning something to make sure the recipe waits?

#stop services on node
our_services “stop” do
action :stop
end

iniate dlldrop

our_dlldrop fname do
action :undo
end

#start Services post backup
our_services “start” do
action :start
end

Thanks


Daniel DeLeo


#3

Hi Daniel,
What would be the best way to achieve that? Would I need some sort of return? My resource is not indempotent as it is only ever run when a particular file needs to be copied, once done it will not be used again (although the code in the library cookbook will obviously be used repeatedly by different cookbooks).

Thanks
Chris

-----Original Message-----
From: Daniel DeLeo [mailto:ddeleo@kallistec.com] On Behalf Of Daniel DeLeo
Sent: 06 November 2014 18:08
To: chef@lists.opscode.com
Subject: [chef] Re: Convergence and Execution Problems

On Thursday, November 6, 2014 at 4:32 AM, ChristopherHall@air-watch.com wrote:

Hi Guys,
I have a question regarding some Convergence and Execution Problems that I am seeing in a recipe I am testing.
I have created a library cookbook to handle some aspects of our application (this is running against only Windows 2008R2 Servers).
The issue is, that after calling a LWRP from the library cookbook that stops particular Windows Services associated with our app, the application cookbook then tries to copy file used to configure the services. This always seems to fail even though I’m capturing the exception and retrying 3 to 20 times. A similar copy for some website configuration has no such problem and run before the service file copy.
I’ve tried different things to get this to work and from what I’m seeing I suspect it’s something to do with the way Chef is executing the steps in the recipe.
I think the LWRP for the services fires and starts the subprocess to stop all the services (written in a ruby class), but once this starts it then carries on to the next step straight away without waiting for the LWRP subprocess to complete, which is why I think the copy of the service files fails (as the services are still shutting down and have the file locked) but the application files are free to be copied over.

From what you’ve described, it sounds like this part of the process needs to block Chef from continuing until it’s finished.

Another reason I think this is because at the end of the application recipe, the same LWRP is called to start the services again, and in the logs I can see it firing off the process then the Chef run finishes and returns successfully, but looking at the Windows servers the services are still starting.
I will admit to not completely understanding the execution order (beyond it firing them off in the order they are written, top to bottom), but I think I am missing something fundamental about how to structure this.

Any pointers would be gratefully received, I find I struggle more because of the Windows side of things being less well documented that the Linux side (which is understandable as more people are running Chef for managing Linux servers).

The recipe section is below, the drop is the part that performs a backup of existing files then copies new ones. Should the Ruby class I use to stop services be returning something to make sure the recipe waits?

#stop services on node
our_services “stop” do
action :stop
end

iniate dlldrop

our_dlldrop fname do
action :undo
end

#start Services post backup
our_services “start” do
action :start
end

Thanks


Daniel DeLeo


#4

On Friday, November 7, 2014 at 6:00 AM, ChristopherHall@air-watch.com wrote:

Hi Daniel,
What would be the best way to achieve that? Would I need some sort of return? My resource is not indempotent as it is only ever run when a particular file needs to be copied, once done it will not be used again (although the code in the library cookbook will obviously be used repeatedly by different cookbooks).

Thanks
Chris

Depends on how this LWRP works. If there’s some external script that gets run, you could modify it to wait for everything to be fully stopped. Alternatively, you could poll for some condition (in ruby code) that tells you it’s safe to proceed.


Daniel DeLeo


#5

Ok, thanks Daniel, I’ll refine the service stopping script to check for service stopped status before it completes.

Thanks for the help!

-----Original Message-----
From: Daniel DeLeo [mailto:ddeleo@kallistec.com] On Behalf Of Daniel DeLeo
Sent: 07 November 2014 15:45
To: chef@lists.opscode.com
Subject: [chef] Re: RE: Re: Convergence and Execution Problems

On Friday, November 7, 2014 at 6:00 AM, ChristopherHall@air-watch.com wrote:

Hi Daniel,
What would be the best way to achieve that? Would I need some sort of return? My resource is not indempotent as it is only ever run when a particular file needs to be copied, once done it will not be used again (although the code in the library cookbook will obviously be used repeatedly by different cookbooks).

Thanks
Chris

Depends on how this LWRP works. If there’s some external script that gets run, you could modify it to wait for everything to be fully stopped. Alternatively, you could poll for some condition (in ruby code) that tells you it’s safe to proceed.


Daniel DeLeo