Can I get some help with an issue I am having with the knife ssh command? It is throwing a socket error when trying to use knife to issue a command on a chef node. As you can see below, the chef node name is the same as the full hostname, and the full hostname is also the same as the FQDN which is resolveable by DNS. The odd thing I noticed is that the error message uses the correct hostname but incorrect domain, but I don't know why.
To protect the innocent, I have flubbed/omitted some values.
my_userp@host_name:~$ SEARCH_QUERY="name:host_name.host_domain.com"
my_userp@host_name:~$ SSH_CMD="ls /home/my_user/"
my_userp@host_name:~$ hostname -f
host_name.host_domain.com
my_userp@host_name:~$ nslookup $(hostname -f)
Server: 10.0.0.100
Address: 10.0.0.100#53
Non-authoritative answer:
Name: host_name.host_domain.com
Address: 10.0.0.100
my_userp@host_name:~$ knife node show $(hostname -f)
Node Name: host_name.host_domain.com
Environment: my_org
FQDN: host_name.host_domain.com
IP: 10.0.0.100
Run List: role[my-role-specific]
Roles: my-role-specific, my-role-general
Recipes: my-code::start, my-code::finish
Platform: ubuntu 14.04
Tags: my-tag
my_userp@host_name:~$ knife ssh "name:$(hostname -f)" "date"
WARNING: Failed to connect to host_name.novalocal -- SocketError: getaddrinfo: Name or service not known
By default knife will attempt to use the FQDN of the node itself according to Ohai but you can pass -a ipaddress and it will use node['ipaddress'] instead.
@chesseplus - it does not seem to behave the way you explain, i.e. using the FQDN of the node itself. The FQDN of the node is host_name.host_domain.com. Instead, it is using host_name.novalocal. What configuration file, etc. is responsible for this different behavior?
no, novalocal is not my workstation's domain. if you look at the output in the original post, you'll see that the workstation domain is "host_domain.com", that the full hostname resolves in DNS and that the chef node is named exactly like the FQDN. I don't know where chef is getting the "novalocal" domain.
The FQDN Ohai gets or has cached doesn't necessarily match hostname -f (though most of the time it should). It could also be that there is some ssh config magic on your system that is changing that. As I said earlier, the easiest way around this is to use -a ipdaddress and then knife uses the IPaddress instead of the FQDN (again, this is the FQDN as ohai/chef-server have it recorded).
@Larryc - Thanks for the link! Basically, it says "this can be resolved by setting dhcp_domain to an empty string in nova.conf on the Control node". However, I am a tenant on our private OpenStack deployment and do not have direct access to the OpenStack configs. I could ask for the host to change that but they may not want to do it in case this affects other tenants.
@cheeseplus - The FQDN that Ohai gives me does match hostname -f, at least using these commands below. I don't know what to say about ssh magic on my system. And yes, I already took your advice to use -a ipaddress for the short term, but I don't want to give up so easily on this FQDN error.
We can only help so much when there is clearly some OpenStack magic in play - using the attribute is the workaround but this sounds like something that needs to be taken up at a higher level in the stack or something on your local system. That is to say this is clearly an environment specific issue and less one with Chef - folks on this end simply don't and won't have access to your environment to see everything that's going on there.
For investigating any local ssh magic, the likely source is ~/.ssh/config.
@cheeseplus - I looked at knife ssh code, in get_ssh_attribute method and found the order of precedence for determining the ssh target. In my instance, it fails the first case (if) since item["target"] is empty, which causes the command to go to the 2nd case (elsif). In that elsif block, it uses attribute item["cloud"]["public_hostname"] as the ssh target. When I view this attribute I see the novalocal domain.
Configure item["target"] to have the desired FQDN.
Configure item["cloud"]["public_hostname"] to have the desired FQDN. I don't know how this attribute is set at the moment.
Remove the value in item["cloud"]["public_hostname"] so that it fails the 2nd case (elsif), which will cause the command to go to the 3rd case (else) where it uses attribute item["fqdn"]. My instance has the desired value in this attribute.