Systemd-journald.service restart kills chef-client on EL 7

Mathew_Crane · February 6, 2016, 5:19pm

Hello,

I have a recipes and cookbooks that manage systemd services (NetworkManager, systemd-journald).

Since upgrading to RHEL 7.2 latest, performing a systemctl restart on either of these during the course of a chef-client run causes the chef-client service to abuptly exit and requires a manual restart of the chef-client service to get the node converging again. Affects both virtualization hosts and guests.

Is this a known issue? Is this an issue that is resolved in a newer version of the chef client package?

RHEL 7.2 (kernel-3.10.0-327.4.5.el7) systemd 219-19
chef 12.0.3 package

Error output:

[root@host001 ~]# systemctl status chef-client
● chef-client.service - Chef Client daemon
   Loaded: loaded (/etc/systemd/system/chef-client.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2016-02-05 00:47:10 UTC; 1 day 15h ago
 Main PID: 20633 (code=exited, status=1/FAILURE)

Feb 05 00:47:10 host001 chef-client[20633]: [2016-02-05T00:47:10+00:00] INFO: Processing cookbook_file[journald.conf] action create (linux-core::default line 438)
Feb 05 00:47:10 host001 chef-client[20633]: [2016-02-05T00:47:10+00:00] INFO: cookbook_file[journald.conf] backed up to /var/chef/backup/etc/systemd/journald.conf.chef-20...04710.069193
Feb 05 00:47:10 host001 chef-client[20633]: [2016-02-05T00:47:10+00:00] INFO: cookbook_file[journald.conf] updated file contents /etc/systemd/journald.conf
Feb 05 00:47:10 host001 chef-client[20633]: [2016-02-05T00:47:10+00:00] INFO: cookbook_file[journald.conf] sending restart action to service[systemd-journald] (immediate)
Feb 05 00:47:10 host001 chef-client[20633]: [2016-02-05T00:47:10+00:00] INFO: Processing service[systemd-journald] action restart (linux-core::default line 451)
Feb 05 00:47:10 host001 systemd[1]: chef-client.service: main process exited, code=exited, status=1/FAILURE
Feb 05 00:47:10 host001 systemd[1]: Unit chef-client.service entered failed state.
Feb 05 00:47:10 host001 systemd[1]: chef-client.service failed.

Mathew_Crane · February 8, 2016, 6:05pm

Update: 12.6.0 has the same issue. Investigating systemd 219 changes

Edit: Testing on a newly-kicked 7.1 image, this seems to affect systemd 208 as well.

service 'systemd-journald.service' do
  action :restart
end

Causes chef-client running in the system.slice to bomb out with exit status ‘1’. Any way to get more useful information from chef-client?

Mathew_Crane · February 10, 2016, 5:08pm

This smells like a ruby+systemd bug, but I’m still unsure. From more testing, it seems that the stdout pipe in use gets clobbered when a systemd dbus service restarts, causing chef-client to exit ungracefully.

I’ve resolved the issue by setting log_location from STDOUT to /var/log/chef/client.log in /etc/chef/client.rb.

The cookbook managing chef-client piece is ‘chef-client 4.3.2’, using the default systemd parameters.

/etc/systemd/system/chef-client.service:

[Unit]
Description=Chef Client daemon
After=network.target auditd.service

[Service]
EnvironmentFile=/etc/sysconfig/chef-client
ExecStart=/usr/bin/chef-client -c $CONFIG -i $INTERVAL -s $SPLAY $OPTIONS
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

/etc/chef/client.rb:

chef_server_url "https://chefserver.int.net"
client_fork true
log_level :info
log_location "/var/log/chef/client.log"
trusted_certs_dir "/etc/chef/trusted_certs"
validation_client_name "chef-validator"
verify_api_cert true
node_name "host001"

Also, FWIW, a forked chef-client run from the daemon doesn’t pick up the new log_location settings until the chef-client service is restarted, based on my testing.

Mathew_Crane · February 10, 2016, 8:45pm

Actually, setting the logfile in client.rb masked the issue. After restarting systemd-journald, the service stays up but all future chef-client runs spawned via the service fail.

I’ve opened an issue report against the chef-client cookbook.

Can anyone else validate this issue?

nathwill · February 12, 2016, 8:18am

weird issue… almost like the STDOUT file descriptor gets mucked up somehow during the restart. probably this issue? it sounds like either ruby or chef-client itself is not handling SIGPIPE correctly? i’ve updated the bug you filed with a link to the relevant discussion.

in the meantime, i’d think you should at least be able to avoid the need for manual intervention by telling systemd to always restart the chef-client service with the Restart=always or Restart=on-failure via a drop-in unit

e.g.

/etc/systemd/system/chef-client.service.d/always-restart.conf

[Service]
Restart=always

if you’re using the systemd cookbook for unit management, instead of a cookbook file or template and a notify to systemctl daemon-reload execute resource, the above can be achieved with something like:

systemd_service 'restart-chef-client' do
  drop_in true
  override 'chef-client'
  restart 'always'
end

obviously not an ideal solution, but maybe enough to be going on with while the main issue’s worked on.

nathwill · February 12, 2016, 8:36am

so i haven’t tested this, but it looks like you may actually be able to configure systemd-journald to serialize it’s state and reclaim the FDs after a restart!

following up on the fact that the previously linked bug is marked resolved-fixed, it appears that an option has been added to allow a service (such as systemd-journald) to preserve it’s file-descriptors across a restart in version 219 (available in EL 7.2). this is supported in systemd-journald, but disabled by default. i believe it’s this option.

i have no idea what a reasonable maximum number of FDs for your systemd-journald to store in the service manager might be, but you should be able to get a rough estimate by counting the number of files under /proc/PID/fd/, where PID is systemd-journald’s pid (my systems seem to average around 30).

Topic		Replies	Views
Chef-client fails on daemon restart Chef Infra (archive)	0	946	April 27, 2017
Improvements to chef-client updating with chef_client_updater and chef-client cookbooks Chef Release Announcements	0	809	May 11, 2017
Chef-client cookbook issues - totally does not work Chef Infra (archive)	2	738	November 5, 2013
Chef-client::systemd_service failed Chef Infra Client	0	593	May 11, 2023
[RESOLVED] Chef server 12.16.9 systemctl stop private_chef-runsvdir-start failed Chef Infra (archive)	2	1620	September 8, 2017

Systemd-journald.service restart kills chef-client on EL 7

Related topics