Issues with chef-client service

Stewart_Curtis · April 1, 2014, 8:41pm

We’re currently using the chef-client cookbook to setup a service (init by default) so chef-client runs on a specified interval. We’re also using the default value of 1800 (seconds) for the interval.

Apparently some of our runs take more than 1800 seconds, which is OK, however, the service stops running and logs a FATAL error, "FATAL: Chef is already running pid 21697”.

I’m going to up the interval to just once a day, but I’m wondering if there are solutions to skipping a converge if there’s already one running. I’d like to avoid scenarios in the future where the service does not start back up. At this point, we have to manually start the service back up if it ever fails.

Thanks,
Curtis

kallistec · April 3, 2014, 9:29pm

On Tuesday, April 1, 2014 at 1:41 PM, Stewart, Curtis wrote:

We’re currently using the chef-client cookbook to setup a service (init by default) so chef-client runs on a specified interval. We’re also using the default value of 1800 (seconds) for the interval.

Apparently some of our runs take more than 1800 seconds, which is OK, however, the service stops running and logs a FATAL error, "FATAL: Chef is already running pid 21697”.

I’m going to up the interval to just once a day, but I’m wondering if there are solutions to skipping a converge if there’s already one running. I’d like to avoid scenarios in the future where the service does not start back up. At this point, we have to manually start the service back up if it ever fails.

Thanks,
Curtis

If you run a single daemonized instance of chef-client, then the length of your run should not matter, because the interval is the time that chef client sleeps in between runs, not how often it tries to start a run. This behavior sounds like you have chef-client running via cron and init at the same time. Did you ever check if pid 21697 was alive and a running instance of chef-client?

--
Daniel DeLeo

Stewart_Curtis · April 4, 2014, 1:26pm

Maybe I just need to take some more time to better understand the chef-client cookbook….

We’re currently adding the following recipes to our baseos:

chef-client::config
chef-client
chef-client::delete_validation

We set the [‘chef-client’][‘interval’] attribute to 86400 (1 day), under the assumption that means chef-client will converge the node once a day.

Our tests have been on ubuntu-12.04 boxes, which use the init_service by default.

I’m curious to know if we may be misunderstanding the usage, but it seems fairly straight forward. I’ll continue to do some troubleshooting to see what the issue is.

Thanks!

Curtis

On Apr 3, 2014, at 4:29 PM, Daniel DeLeo <dan@kallistec.com mailto:dan@kallistec.com> wrote:

On Tuesday, April 1, 2014 at 1:41 PM, Stewart, Curtis wrote:

We’re currently using the chef-client cookbook to setup a service (init by default) so chef-client runs on a specified interval. We’re also using the default value of 1800 (seconds) for the interval.

Apparently some of our runs take more than 1800 seconds, which is OK, however, the service stops running and logs a FATAL error, "FATAL: Chef is already running pid 21697”.

I’m going to up the interval to just once a day, but I’m wondering if there are solutions to skipping a converge if there’s already one running. I’d like to avoid scenarios in the future where the service does not start back up. At this point, we have to manually start the service back up if it ever fails.

Thanks,
Curtis

If you run a single daemonized instance of chef-client, then the length of your run should not matter, because the interval is the time that chef client sleeps in between runs, not how often it tries to start a run. This behavior sounds like you have chef-client running via cron and init at the same time. Did you ever check if pid 21697 was alive and a running instance of chef-client?

–
Daniel DeLeo

BradKnowles · April 4, 2014, 3:20pm

On Apr 4, 2014, at 8:26 AM, Stewart, Curtis cstewart@momentumsi.com wrote:

We set the [‘chef-client’][‘interval’] attribute to 86400 (1 day), under the assumption that means chef-client will converge the node once a day.

That's my understanding, yes. I think the general consensus is that this is typically done on a more frequent basis, like at least once an hour. But with those settings, this is what I would expect to happen.

If the chef-client is getting killed or dying, then it would be possible to add a daily cron job that would send a "start" notification to the chef-client service, and if the service is already started then it should have no effect.

That would be a "belt-and-suspenders" approach to the problem.

--
Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu

Topic		Replies	Views
Trouble getting chef-client to run automatically Chef Infra (archive)	1	1245	April 7, 2014
Client.rb Intervals Chef Infra (archive)	9	795	July 20, 2019
Chef-client cookbook chef-client run interval and delay are not exactly as it is in the real run Chef Infra (archive)	2	1198	June 27, 2018
Chef-client cookbook - can I predict the next time chef-client will run automatically? Chef Infra (archive)	0	287	May 5, 2015
Chef-client first run Chef Infra (archive)	6	337	August 23, 2015

Issues with chef-client service

Related topics