Knife winrm issue when running agains groups of servers

I’m a windows admin that’s new to chef, so any help would be appreciated. I’m able to run “knife ssh” against groups of Linux servers using “os:linux” with no issue. However, when running “knife winrm “os:windows” ‘net stats srv’ -x “domain\username” -P ‘password’” it fails with error "Failed to authenticate to [os:windows] as domain\username Response: Bad HTTP response returned from server (401)
note: when running in verbose I can see it’s finding all the windows servers, but it ends with the error above

I can run this same command against a single machine without issue. “knife winrm “name:server.fqdn” ‘net stats srv’ -x “domain\username” -P ‘password’”

I run "knife search “os:windows”"and it returns all my windows servers

In the verbose output (with -VV) Is there a particular node that triggers this error? From the output can you see if the FWDN used to authenticate is correct and that those credentials are valid on that node? Forgive me if this question seems overly obvious but I have definitely seen environments where a rogue node has “special” credentials.

Update: The FQDNs returned are correct in the -VV output. However, I now find that the command below with single quotes around the os:windows variable works partially or immediately fails. Basically, when it works it will start returning results for the different servers, but it eventually stops before it should with "Failed to authenticate to [os:windows] as domain\username Response: Bad HTTP response returned from server (401). It almost seems like a timeout, but it happens so quickly (like a few seconds). The -VV output doesn’t show any triggers. It basically fails immediately and ends with the error already stated or returns results correctly then for some reason stops returning results and ends with "Failed to authenticate to [os:windows] as domain\username Response: Bad HTTP response returned from server (401).

knife winrm ‘os:windows’ ‘net stats srv’ -x “domain\username” -P ‘password’

I know when I played with knife winrm a few years back we occasionally had to restart the service on the Windows nodes. Sometimes it would only run part of a command and die and sometimes it would just fail to auth, but restarting the service seemed to clear things up. If you’re running into odd issues with WinRM, give that a shot, at least just to rule it out as a potential culprit.

Nathan Clemons
DevOps Engineer
Moxie Cloud Services (MCS)

O +1.425.467.5075
M +1.360.861.6291
E nclemons@gomoxie.com
W www.gomoxie.comhttp://www.gomoxie.com/

Is your window server registered under domain?

Whenever I bootstrap, I run the following command which works well “knife bootstrap windows winrm $ip -x $userName -P “$password” -N $nodeName -r “$runList” --secret-file .\encrypted_data_bag_secret -E “$environment” --bootstrap-version 12.xxxx”

Or if you’d like to deploy remotely, I use " knife winrm $hostname -m -x “$username” -P xxxxx"

When your servers (instances/nodes) are registered as a domain server, then I pass “full domain name\user name”.

Here is the useful doc to test out knife winrm @ https://github.com/chef/knife-windows

Mostly, from my experience, it’s rather than Chef side, it’s more likely you need to make you can connect to the server using the right windows credential.

Simply just starting telnet server ip 5985 (winrm port) to check to see if your connection can be established.

I hope that it helps.

I think it’s worth filing an issue against the knife-windows gem to state the host name of the node that it failed to authenticate against instead of just the query.

That would help you tell if it’s consistently the same individual node.