Bootstrap Windows Exception:undefined method `empty?' for nil:NilClass


#1

Hi,
I am trying to create EC2 Windows instance and bootstrap in one step. Instance is created successfully. Code waits for instance to become available and displays message “Waiting for winrm access to become available”. Every 10 seconds it displays one dot. After about one minute, code continues and stops due to exception:

ERROR: knife encountered an unexpected error
This may be a bug in the 'ec2 server create' knife command or plugin
Please collect the output of this command with the `-VV` option before filing a
bug report.
Exception: NoMethodError: undefined method `empty?' for nil:NilClass

I looked into source and I think I know what may be the problem.
ec2_server_create.rb line 516

    print "\n#{ui.color("Waiting for winrm access to become available", :magenta)}"
    print(".") until tcp_test_winrm(ssh_connect_host, locate_config_value(:winrm_port)) {
      sleep 10
      puts("done")
    }

Code waits (I suppose) till server replies to request on port 5985/6. When connection is established, code continues.
At this time, Windows RM server is not yet ready. I did following test:
I initiated creation of instance. While knife command was executing and displaying “Waiting for winrm access to become available” I opened new PS console. From PS console I tried to enter remote session (using winrm). Enter-PSSession failed with error:

Enter-PSSession : Connecting to remote server 10.8.1.16 failed with the following error message : WinRM cannot
complete the operation.

I was repeating command every few seconds. After few minutes, I got new error:

Enter-PSSession : Connecting to remote server 10.8.1.16 failed with the following error message : Access is denied.

Although, credentials were correct. I was continuing test and after few minutes, remote winrm became fully working and I was able to enter remote session.
Now I initiated knife bootstrap windows rmi and it worked like charm.
Problems

  1. Code does not check is remote server null and invokes method on nil
  2. Logic to check availability of remote winrm is wrong.
    I cannot wait for fix and I am planning to fix it by myself. Any suggestion in which direction to go. I do not have slightest idea how code is organized. Is there any design document, algorithm, or anything what would help to understand code?
    I am using latest version of ChefDK 0.10.0

BR
Zdenko


#2

Chef is pretty much a big collection of Ruby gems. The file that you are referring to is a part of knife-ec2: https://github.com/chef/knife-ec2. If you have a hunch on how this could be fixed, you can pull down the code, make your changes, build yourself a local copy of the gem, and install it easy enough.

As for docs, the rubydoc.info YARD server has the docs for Chef, but you might be on your own with the knife plugin itself (hopefully the source is self-documenting enough).

Make sure you put a pull request in for your changes!


#3

I do not frequently use EC2, but I have managed to use knife-ec2 to successfully bootstrap windows nodes. I will provide here what has worked for me. However, what zdenko describes is clearly a bad user experience. I would strongly suggest creating an issue on the knife-ec2 repo and if possible provide the full debug output.

Ultimately the knife-ec2 gem should not only wait on connectivity but also successful authentication.

Here is what I am guessing is happening in your case without knowing the command parameters or the contents of your user-data, I’d bet in your user-data you setup winrm BEFORE configuring the user. Creating the scenario where winrm is accesable but the user is not yet able to authenticate. You may have better luck configuring the user first.

Here is an examle user-data.ps1 file I use (slightly modified from one Adam Edwards shared with me):

<powershell>
 
$user="testuser"
$password="yomama"

"[System Access]" | out-file c:\delete.cfg
"PasswordComplexity = 0" | out-file c:\delete.cfg -append
"[Version]"  | out-file c:\delete.cfg -append
'signature="$CHICAGO$"'  | out-file c:\delete.cfg -append

net user /add $user $password;
net localgroup Administrators /add $user; 
 
winrm quickconfig -q
winrm set winrm/config '@{MaxTimeoutms="1800000"}'
winrm set winrm/config/service '@{AllowUnencrypted="true"}'

netsh advfirewall firewall set rule name="Windows Remote Management (HTTP-In)" profile=public protocol=tcp localport=5985 remoteip=localsubnet new remoteip=any
 
</powershell>

Here the user is first created, given admin rights and finally winrm is setup.


#4

User data is slightly different but does the same thing:
-Enable WinRMI
-Restarts winrm service
-Disables proxy
-Set admin’s password to known one which would be used during bootstrap process.

I have never managed to bootstrap instance in one step. Even knife bootstrap windows rmi ... fails if executed intermediately after failure of knife ec2 server create .... I repeatedly invoke knife bootstrap windows rmi ... which fails with different error message (depending on the state of winrm service). After few minutes it succeeds.
It is easy to draw conclusion that preconditions were not fulfilled to bootstrap instance. It is reproducible. It fails every time.
As a quick work around I will create custom AMI with installed chef client. I will specify user data which invokes first run of chef (which in turn registers node with chef server).


#5

How about changing your user-data to execute in this order:
-Disables proxy
-Set admin’s password to known one which would be used during bootstrap process.
-Enable WinRMI

This way your user has correct credentials as soon as WinRM becomes accessible.
Also, restarting winrm after enabling it is typically not necessary.


#6

Actual order of invocation is:
-Change password
-Configure WinrM
-Configure firewall
and
net stop winrm
sc.exe config winrm start=auto
net start winrm

I have checked fresh EC2 instance. WinRM service is set to autostart (delayed) but it is not running. It means “net stop winrm” will do nothing. Changing to start auto is also not necessary. The last one is needed: net start winrm. I am testing with t2.micro instances (1 CPU 1 GB RAM). They need more time to boot but it should not be reason to fail. Here is not question about seconds but few minutes.

BTW. I tried workaround to create custom image with configured prerequisites. It seems EC2 instances do not support UserData for custom images. (even basic script dir c:\ > c:\dir.txt did not produce output).


#7

What does your command line look like, e.g. knife ec2 server create ...? Are you passing a run list? We recently fixed a bug that exhibited this behavior if you weren’t. Those fixes haven’t been released yet, but you can workaround it by passing a run list.


Have you tried running with -V -V to get debugging output yet? If so, could you share it?

There was another bug where the stacktrace included in the debugging output wasn’t being included in some circumstances when using knife-ec2, but I don’t believe it affected knife bootstrap.


#8

I added --run-list option and it worked.
Thanks a lot :slight_smile: I am able to bootstrap instance in one step.