Issues with chef-client service

We’re currently using the chef-client cookbook to setup a service (init by default) so chef-client runs on a specified interval. We’re also using the default value of 1800 (seconds) for the interval.

Apparently some of our runs take more than 1800 seconds, which is OK, however, the service stops running and logs a FATAL error, "FATAL: Chef is already running pid 21697”.

I’m going to up the interval to just once a day, but I’m wondering if there are solutions to skipping a converge if there’s already one running. I’d like to avoid scenarios in the future where the service does not start back up. At this point, we have to manually start the service back up if it ever fails.

Thanks,
Curtis

On Tuesday, April 1, 2014 at 1:41 PM, Stewart, Curtis wrote:

We’re currently using the chef-client cookbook to setup a service (init by default) so chef-client runs on a specified interval. We’re also using the default value of 1800 (seconds) for the interval.

Apparently some of our runs take more than 1800 seconds, which is OK, however, the service stops running and logs a FATAL error, "FATAL: Chef is already running pid 21697”.

I’m going to up the interval to just once a day, but I’m wondering if there are solutions to skipping a converge if there’s already one running. I’d like to avoid scenarios in the future where the service does not start back up. At this point, we have to manually start the service back up if it ever fails.

Thanks,
Curtis

If you run a single daemonized instance of chef-client, then the length of your run should not matter, because the interval is the time that chef client sleeps in between runs, not how often it tries to start a run. This behavior sounds like you have chef-client running via cron and init at the same time. Did you ever check if pid 21697 was alive and a running instance of chef-client?

--
Daniel DeLeo

Maybe I just need to take some more time to better understand the chef-client cookbook….

We’re currently adding the following recipes to our baseos:

chef-client::config
chef-client
chef-client::delete_validation

We set the [‘chef-client’][‘interval’] attribute to 86400 (1 day), under the assumption that means chef-client will converge the node once a day.

Our tests have been on ubuntu-12.04 boxes, which use the init_service by default.

I’m curious to know if we may be misunderstanding the usage, but it seems fairly straight forward. I’ll continue to do some troubleshooting to see what the issue is.

Thanks!

Curtis

On Apr 3, 2014, at 4:29 PM, Daniel DeLeo <dan@kallistec.commailto:dan@kallistec.com> wrote:

On Tuesday, April 1, 2014 at 1:41 PM, Stewart, Curtis wrote:

We’re currently using the chef-client cookbook to setup a service (init by default) so chef-client runs on a specified interval. We’re also using the default value of 1800 (seconds) for the interval.

Apparently some of our runs take more than 1800 seconds, which is OK, however, the service stops running and logs a FATAL error, "FATAL: Chef is already running pid 21697”.

I’m going to up the interval to just once a day, but I’m wondering if there are solutions to skipping a converge if there’s already one running. I’d like to avoid scenarios in the future where the service does not start back up. At this point, we have to manually start the service back up if it ever fails.

Thanks,
Curtis

If you run a single daemonized instance of chef-client, then the length of your run should not matter, because the interval is the time that chef client sleeps in between runs, not how often it tries to start a run. This behavior sounds like you have chef-client running via cron and init at the same time. Did you ever check if pid 21697 was alive and a running instance of chef-client?


Daniel DeLeo

On Apr 4, 2014, at 8:26 AM, Stewart, Curtis cstewart@momentumsi.com wrote:

We set the [‘chef-client’][‘interval’] attribute to 86400 (1 day), under the assumption that means chef-client will converge the node once a day.

That's my understanding, yes. I think the general consensus is that this is typically done on a more frequent basis, like at least once an hour. But with those settings, this is what I would expect to happen.

If the chef-client is getting killed or dying, then it would be possible to add a daily cron job that would send a "start" notification to the chef-client service, and if the service is already started then it should have no effect.

That would be a "belt-and-suspenders" approach to the problem.

--
Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu