Chef Server Upgrade Failure from 11.1.7 to 12.X

Hello Folks. I’m testing a chef server upgrade from 11.1.7 to 12.13 and 12.14 and each upgrade fails with the same error:

/opt/opscode/embedded/service/omnibus-ctl/open_source_chef12_upgrade.rb:322:in check_status': undefined methodsuccess?’ for nil:NilClass (NoMethodError)
from /opt/opscode/embedded/service/omnibus-ctl/open_source_chef12_upgrade.rb:355:in run_knife_download' from /opt/opscode/embedded/service/omnibus-ctl/open_source_chef12_upgrade.rb:82:indownload_chef11_data_download’
from /opt/opscode/embedded/service/omnibus-ctl/open_source_chef12_upgrade.rb:60:in download_chef11_data' from /opt/opscode/embedded/service/omnibus-ctl/open_source_chef12_upgrade.rb:41:inrun_upgrade’
from /opt/opscode/embedded/service/omnibus-ctl/upgrade.rb:135:in block in load_file' from /opt/opscode/embedded/lib/ruby/gems/2.2.0/gems/omnibus-ctl-0.5.0/lib/omnibus-ctl.rb:199:incall’
from /opt/opscode/embedded/lib/ruby/gems/2.2.0/gems/omnibus-ctl-0.5.0/lib/omnibus-ctl.rb:199:in block in add_command_under_category' from /opt/opscode/embedded/lib/ruby/gems/2.2.0/gems/omnibus-ctl-0.5.0/lib/omnibus-ctl.rb:736:inrun’
from /opt/opscode/embedded/service/omnibus-ctl/chef-server-ctl:237:in `’

I’ve followed the instructions from here:

https://docs.chef.io/upgrade_server.html#from-open-source-chef

Each test was performed on Ubuntu 14.04 VMs. The upgrade code appears to be trying to call success? on a Nil object. This call is happening in the context of a run_knife_download, which calls run_command on a shell out to knife. The result of that is the status, which appears to be nil. That nil object is what seems to be causing the issue.

Not sure how run_command is being brought in, since I don’t see it being mixed in anywhere. I vaguely remember that its the method that executes a Mixlib::ShellOut object, but I don’t see any of that being instantiated. Coincidently, I do think this might be the first time run_command is executed.

Anyway, I’d really like to figure this out. I’m slated to perform an upgrade to Chef server 12 soon in production.

Update:

So, I’ve executed the following command manually and re-executed chef-server-ctl upgrade. Here is the output:

Ensuring Chef 12 server is stopped

/opt/opscode/embedded/service/omnibus-ctl/open_source_chef12_upgrade.rb:322:in check_status': undefined methodsuccess?’ for nil:NilClass (NoMethodError)
from /opt/opscode/embedded/service/omnibus-ctl/open_source_chef12_upgrade.rb:404:in stop_chef12' from /opt/opscode/embedded/service/omnibus-ctl/open_source_chef12_upgrade.rb:65:indownload_chef11_data_setup’
from /opt/opscode/embedded/service/omnibus-ctl/open_source_chef12_upgrade.rb:59:in download_chef11_data' from /opt/opscode/embedded/service/omnibus-ctl/open_source_chef12_upgrade.rb:41:inrun_upgrade’
from /opt/opscode/embedded/service/omnibus-ctl/upgrade.rb:135:in block in load_file' from /opt/opscode/embedded/lib/ruby/gems/2.2.0/gems/omnibus-ctl-0.5.0/lib/omnibus-ctl.rb:199:incall’
from /opt/opscode/embedded/lib/ruby/gems/2.2.0/gems/omnibus-ctl-0.5.0/lib/omnibus-ctl.rb:199:in block in add_command_under_category' from /opt/opscode/embedded/lib/ruby/gems/2.2.0/gems/omnibus-ctl-0.5.0/lib/omnibus-ctl.rb:736:inrun’
from /opt/opscode/embedded/service/omnibus-ctl/chef-server-ctl:237:in `’

It appears to have gotten passed the knife download, but is now failing out on a shell out to chef-server-ctl stop, which is executed by the problematic run_command method. I have a feeling a fix would be to require mixlib/shellout then create a helper method that will basically create a shellout object and call run_command on that instead.

After increasing the RAM allocation, the upgrade script succeeded. It appears that error handling might not be working as expected. However, on the whole, my issue appears to be resolved.

That actually fits the pattern that’s starting to emerge - in low memory situations, there are times when the shell out to the OS may return a nil instead of a status object.

We’ve seen similar situations in installations, though the specific shell-out location is different.

I’m glad to hear that increasing RAM allocation resolved it - I was typing up that as a next suggestion when the message arrived.