[RESOLVED] Differences in using sudo with test-kitchen vs chef-client

We’re trying to run chef-client with elevated privileges by using sudo chef-client. For a simple test, we are only trying to update a user that already exists, but needs the shell changed to /bin/false. The process fails with the following error:

Errno::ENOENT
-------------
No such file or directory - usermod

An strace analysis shows getuid reverts back to the calling user’s UID as soon as chef-client clones (forks) a new process:

$ grep getuid /tmp/sudo_chef2.strace | uniq
8871  getuid()                          = 0
8873  getuid()                          = 0
8871  getuid()                          = 0
8876  getuid()                          = 70224
8879  getuid()                          = 70224
8882  getuid()                          = 70224
8885  getuid()                          = 70224
8888  getuid()                          = 70224
8891  getuid()                          = 70224
8894  getuid()                          = 70224
8897  getuid()                          = 70224
8900  getuid()                          = 70224
8903  getuid()                          = 70224
8906  getuid()                          = 70224
8909  getuid()                          = 70224
8912  getuid()                          = 70224
8915  getuid()                          = 70224
8918  getuid()                          = 70224
8921  getuid()                          = 70224
8924  getuid()                          = 70224
8927  getuid()                          = 70224
8930  getuid()                          = 70224
8933  getuid()                          = 70224
8942  getuid()                          = 70224
8946  getuid()                          = 70224
8947  getuid()                          = 70224
8948  getuid()                          = 70224
8949  getuid()                          = 70224
8950  getuid()                          = 70224
8951  getuid()                          = 70224
8952  getuid()                          = 70224
8953  getuid()                          = 70224
8954  getuid()                          = 70224
8955  getuid()                          = 70224
8956  getuid()                          = 70224
8957  getuid()                          = 70224
8961  getuid()                          = 70224
8964  getuid()                          = 70224
8967  getuid( <unfinished ...>
8967  <... getuid resumed> )            = 70224
8967  getuid()                          = 70224
8970  getuid( <unfinished ...>
8970  <... getuid resumed> )            = 70224
8970  getuid()                          = 70224
8973  getuid()                          = 70224
8976  getuid()                          = 70224
8977  getuid()                          = 70224
8980  getuid()                          = 70224
8989  getuid()                          = 70224
8996  getuid()                          = 70224
9031  getuid()                          = 70224
9042  getuid()                          = 70224
9042  getuid( <unfinished ...>
9042  <... getuid resumed> )            = 70224
9049  getuid()                          = 70224
9052  getuid()                          = 70224
9056  getuid()                          = 70224
9060  getuid()                          = 70224
9060  getuid( <unfinished ...>
9060  <... getuid resumed> )            = 70224
9060  getuid()                          = 70224
9064  getuid()                          = 70224
8876  getuid()                          = 70224
9078  getuid()                          = 70224
9085  getuid()                          = 70224
9088  getuid()                          = 70224
9091  getuid()                          = 70224
9094  getuid()                          = 70224
9097  getuid()                          = 70224
9100  getuid()                          = 70224
9103  getuid()                          = 70224
8876  getuid()                          = 70224
9115  getuid()                          = 70224
8876  getuid()                          = 70224
9139  getuid()                          = 70224

What is confusing is that when we run the same recipe with Test Kitchen and chef-zero, it works as expected.

The strace also shows that the chef-client is only passing back the last error it receives (ENOENT) vs. the first (EACCES) error.

I discovered the actually command being ran via kitchen converge -l debug zookeeper1.

When I ran the command manually, it succeeded in modifying the user:

$ sudo -E /opt/chef/bin/chef-client --local-mode --config /tmp/kitchen/client.rb --log_level info --force-formatter --no-color --json-attributes /tmp/kitchen/dna.json --chef-zero-port 8889
[2017-08-08T16:54:23+00:00] INFO: Started chef-zero at chefzero://localhost:8889 with repository at /tmp/kitchen, /tmp/kitchen
  One version per cookbook

[2017-08-08T16:54:23+00:00] INFO: Forking chef instance to converge...
Starting Chef Client, version 12.19.33
[2017-08-08T16:54:23+00:00] INFO: *** Chef 12.19.33 ***
[2017-08-08T16:54:23+00:00] INFO: Platform: x86_64-linux
[2017-08-08T16:54:23+00:00] INFO: Chef-client pid: 17132
[2017-08-08T16:54:26+00:00] INFO: Setting the run_list to ["recipe[csg_confluent::_users]"] from CLI options
[2017-08-08T16:54:26+00:00] INFO: Run List is [recipe[csg_confluent::_users]]
[2017-08-08T16:54:26+00:00] INFO: Run List expands to [csg_confluent::_users]
[2017-08-08T16:54:26+00:00] INFO: Starting Chef Run for zookeeper1-parchtest
[2017-08-08T16:54:26+00:00] INFO: Running start handlers
[2017-08-08T16:54:26+00:00] INFO: Start handlers complete.
[2017-08-08T16:54:26+00:00] INFO: HTTP Request Returned 404 Not Found: Object not found: 
resolving cookbooks for run list: ["csg_confluent::_users"]
[2017-08-08T16:54:26+00:00] INFO: Loading cookbooks [csg_confluent@172.32.0, systemd@2.1.3, java_properties@0.1.3, chef_hostname@0.5.0]
Synchronizing Cookbooks:
  - csg_confluent (172.32.0)
  - chef_hostname (0.5.0)
  - systemd (2.1.3)
  - java_properties (0.1.3)
Installing Cookbook Gems:
Compiling Cookbooks...
Converging 2 resources
Recipe: csg_confluent::_users
  * group[zookeeper] action create[2017-08-08T16:54:27+00:00] INFO: Processing group[zookeeper] action create (csg_confluent::_users line 3)
 (up to date)
  * linux_user[zookeeper] action create[2017-08-08T16:54:27+00:00] INFO: Processing linux_user[zookeeper] action create (csg_confluent::_users line 8)
[2017-08-08T16:54:27+00:00] INFO: linux_user[zookeeper] altered

    - alter user zookeeper
[2017-08-08T16:54:27+00:00] INFO: Chef Run complete in 1.31691163 seconds

Running handlers:
[2017-08-08T16:54:27+00:00] INFO: Running report handlers
Running handlers complete
[2017-08-08T16:54:27+00:00] INFO: Report handlers complete
Chef Client finished, 1/2 resources updated in 04 seconds

Here’s the same command with chef-client running against our Chef Server:

$ sudo -E /opt/chef/bin/chef-client --config ~/.chef/client.rb --log_level info --force-formatter --no-color -o 'recipe[csg_confluent::_users]'
[2017-08-08T17:04:57+00:00] INFO: About to change privilege to zookeeper:zookeeper
[2017-08-08T17:04:57+00:00] INFO: Forking chef instance to converge...
Starting Chef Client, version 12.19.33
[2017-08-08T17:04:57+00:00] INFO: *** Chef 12.19.33 ***
[2017-08-08T17:04:57+00:00] INFO: Platform: x86_64-linux
[2017-08-08T17:04:57+00:00] INFO: Chef-client pid: 19093
[2017-08-08T17:05:00+00:00] WARN: Run List override has been provided.
[2017-08-08T17:05:00+00:00] WARN: Original Run List: [recipe[csg_confluent::zookeeper]]
[2017-08-08T17:05:00+00:00] WARN: Overridden Run List: [recipe[csg_confluent::_users]]
[2017-08-08T17:05:00+00:00] INFO: Run List is [recipe[csg_confluent::_users]]
[2017-08-08T17:05:00+00:00] INFO: Run List expands to [csg_confluent::_users]
[2017-08-08T17:05:00+00:00] INFO: Starting Chef Run for kafkazkpr0001
[2017-08-08T17:05:00+00:00] INFO: Running start handlers
[2017-08-08T17:05:00+00:00] INFO: Start handlers complete.
[2017-08-08T17:05:00+00:00] INFO: HTTP Request Returned 404 Not Found: 
resolving cookbooks for run list: ["csg_confluent::_users"]
[2017-08-08T17:05:04+00:00] INFO: Loading cookbooks [chef_hostname@0.5.0, java_properties@0.1.3, systemd@2.1.3, csg_confluent@173.30.0]
[2017-08-08T17:05:04+00:00] INFO: Skipping removal of obsoleted cookbooks from the cache
Synchronizing Cookbooks:
  - chef_hostname (0.5.0)
  - java_properties (0.1.3)
  - systemd (2.1.3)
  - csg_confluent (173.30.0)
Installing Cookbook Gems:
Compiling Cookbooks...
Converging 2 resources
Recipe: csg_confluent::_users
  * group[zookeeper] action create[2017-08-08T17:05:04+00:00] INFO: Processing group[zookeeper] action create (csg_confluent::_users line 3)
 (up to date)
  * linux_user[zookeeper] action create[2017-08-08T17:05:04+00:00] INFO: Processing linux_user[zookeeper] action create (csg_confluent::_users line 8)

    
    ================================================================================
    Error executing action `create` on resource 'linux_user[zookeeper]'
    ================================================================================
    
    Errno::ENOENT
    -------------
    No such file or directory - usermod
    
    Resource Declaration:
    ---------------------
    # In /zookeeper/home/.chef/cache/cookbooks/csg_confluent/recipes/_users.rb
    
      8: user confluent_user do
      9:   gid confluent_user
     10:   shell '/bin/false'
     11:   system true
     12:   action :create
     13: end
    
    Compiled Resource:
    ------------------
    # Declared in /zookeeper/home/.chef/cache/cookbooks/csg_confluent/recipes/_users.rb:8:in `from_file'
    
    linux_user("zookeeper") do
      action [:create]
      supports {:manage_home=>false, :non_unique=>false}
      retries 0
      retry_delay 2
      default_guard_interpreter :default
      username "zookeeper"
      uid nil
      gid 70224
      home nil
      shell "/bin/false"
      system true
      iterations 27855
      declared_type :user
      cookbook_name "csg_confluent"
      recipe_name "_users"
    end
    
    Platform:
    ---------
    x86_64-linux
    
[2017-08-08T17:05:04+00:00] INFO: Running queued delayed notifications before re-raising exception

Running handlers:
[2017-08-08T17:05:04+00:00] ERROR: Running exception handlers
Running handlers complete
[2017-08-08T17:05:04+00:00] ERROR: Exception handlers complete
Chef Client failed. 0 resources updated in 07 seconds
[2017-08-08T17:05:04+00:00] FATAL: Stacktrace dumped to /zookeeper/home/.chef/cache/chef-stacktrace.out
[2017-08-08T17:05:04+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2017-08-08T17:05:04+00:00] ERROR: linux_user[zookeeper] (csg_confluent::_users line 8) had an error: Errno::ENOENT: No such file or directory - usermod
[2017-08-08T17:05:05+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)

I found the problem. There were user & group parameters set within the local client.rb file:

$ cat .chef/client.rb
...
log_level :warn
user "zookeeper"
group "zookeeper"
cache_path "/zookeeper/home/.chef"
client_key "/zookeeper/home/.chef/client.pem"
ohai.disabled_plugins = [:Passwd]