I have a question regarding some Convergence and Execution Problems that I am seeing in a recipe I am testing.
I have created a library cookbook to handle some aspects of our application (this is running against only Windows 2008R2 Servers).
The issue is, that after calling a LWRP from the library cookbook that stops particular Windows Services associated with our app, the application cookbook then tries to copy file used to configure the services. This always seems to fail even though I’m capturing the exception and retrying 3 to 20 times. A similar copy for some website configuration has no such problem and run before the service file copy.
I’ve tried different things to get this to work and from what I’m seeing I suspect it’s something to do with the way Chef is executing the steps in the recipe.
I think the LWRP for the services fires and starts the subprocess to stop all the services (written in a ruby class), but once this starts it then carries on to the next step straight away without waiting for the LWRP subprocess to complete, which is why I think the copy of the service files fails (as the services are still shutting down and have the file locked) but the application files are free to be copied over.
Another reason I think this is because at the end of the application recipe, the same LWRP is called to start the services again, and in the logs I can see it firing off the process then the Chef run finishes and returns successfully, but looking at the Windows servers the services are still starting.
I will admit to not completely understanding the execution order (beyond it firing them off in the order they are written, top to bottom), but I think I am missing something fundamental about how to structure this.
Any pointers would be gratefully received, I find I struggle more because of the Windows side of things being less well documented that the Linux side (which is understandable as more people are running Chef for managing Linux servers).
The recipe section is below, the drop is the part that performs a backup of existing files then copies new ones. Should the Ruby class I use to stop services be returning something to make sure the recipe waits?
#stop services on node
our_services “stop” do
our_dlldrop fname do
#start Services post backup
our_services “start” do