ChefDK, VMWare and MacOSX Crashing


#1

Hello all,

I hope you can help. I'm new to Chef and have been learning on a virtualized MacOS 10.13 (High Sierra) machine running on VMWare vSphere 6.7.0. I've pulled the latest version of ChefDK 14.4.56 using Homebrew.

When I run 'sudo chef-client' locally (-z) or via bootstrap from my workstation using 'knife bootstrap --sudo', the first attempt works as intended. However, every subsequent run causes the VM to reboot itself. The system logs indicate a kernel panic, this issue is repeatable.

I've started with a fresh install of macos 10.13 on a new VM, and have have restarted the host. I'm at my end with regards to troubleshooting this issue and haven't seen any threads via Google of other having this issue.

Has the community seen this issue before? Are there are any insights or assistance you can provide?

Many thanks!
R. Alcazar


#2

Since the initial run of chef-client is working as expected (or at least does not crash) I would look at any code in the cookbooks that are run and examine any code that is not a chef resource. The custom code (ruby blocks, shell scripts, etc.) is a source of non-idempotent actions. This is where I would look first.


#3

The quick feedback is appreciated.

Actually, when I run 'sudo chef-client' I do not pass any recipe/cookbook. I'm only using 'sudo chef-client' and 'knife bootstrap' to exclude any issues related to recipes.


#4

The cookbooks it will run are usually defined in the runlist that was defined when the node was bootstrapped, it can also be defined in the environment attributes. Try a "chef-client -W" and see what behavior you see. Can you post the output from "chef-client" when you experience a crash.


#5

The results of 'sudo chef-client -W' are below. The first run was successful, the 2nd run immediately after caused the VM to reboot.

Ricardos-Mac:~ ralcazar$ sudo chef-client -W
Password:
Starting Chef Client, version 14.4.56
resolving cookbooks for run list: []
Synchronizing Cookbooks:
Installing Cookbook Gems:
Compiling Cookbooks...
[2018-09-19T05:17:39-07:00] WARN: Node chef-node-gma-01 has an empty run list.
Converging 0 resources
[2018-09-19T05:17:39-07:00] WARN: In why-run mode, so NOT performing node save.
Running handlers:
Running handlers complete
Chef Client finished, 0/0 resources would have been updated
Ricardos-Mac:~ ralcazar$ sudo chef-client -W
Starting Chef Client, version 14.4.56


#6

I would suggest that you open up a console on the VM through vCenter and see if you can capture anything there while you run the second chef-client run.


#7

Is the vSphere host mac hardware? Otherwise this can be a bit of rabbit hole effectively trying to troubleshoot a Hackintosh installation. My feeling here is that it's super unlikely that it's the chef DSL code doing this and not likely that it's chef-client (on it's own) causing the kernel panic.


#8

Hi Cheese,

Yes, vSphere host is running on Mac Pro. I've tried reviewing the system logs to find additional clues, but haven't found anything relevant.


#9

Update: I've freshly installed vSphere 6.7.0 on another Mac Pro, and created a VM using MacOS 10.13.6. The result is the same, the VM reboots after calling 'sudo chef-client' 2-3 times. Any ideas, would really like to see this succeed. Thanks!


#10

I think cheeseplus might have a good point. The next test I would suggest is to install vSphere 6.7.0 on another non-mac platform and repeat the test on the same MacOS 10.13.6. Thus changing only a single variable. Then repeat the chef-client runs.

Another suggestion is that you stated that with the fresh install it took 2 or 3 runs. Since previously it always failed on 2 runs, you might also be looking at a timing problem. Once you bring up the VM try leaving it idling with no chef-client runs and see how the VM behaves for 15+ minutes.


#11

My recollection, as I am running vSphere on a Mac Pro for the bento project, is that 6.5+ have issues virtualizing and Mac OS has more problems under that setup. Although this is more of a hunch than a definite, it really feels like it's the hypervisor itself or some instruction the guest is throwing is what's causing the kernel panic.


#12

Thanks all, I appreciate feedback in helping troubleshoot this issue.

I've performed additional testing using 'knife bootstrap --sudo' and 'sudo chef-client'. Both commands work reliably (without crashes) if I set the VM to 1 CPU only. If the # of CPUs assigned is increased to 2, the crashes resume.

In my previous experience, VMWare virtual machines work reliably with multiple CPUs and OSX 10.13.6. We use these VMs for resource intensive compiles using Xcode with as many as 8 cores and 12GB memory.

Is there any knowledge or issues running chef using multiple cores?

Kind regards,
R. Alcazar


#13

Chef doesn't really care about the number or cores and it runs on machines with a variety of cores and architectures. Again, I hate to say, this is the hypervisor and you're probably best off digging through the VMware release errata or their forums for why it's causing kernel panics.


#14

Thanks again for the support, I'll dig deeper with VMWare and the hypervisor.