Knife never times out waiting on an SSH connection


#1

Hi everyone,

I wanted to get an idea if anyone had some input on how to go about fixing this
issue:

https://tickets.opscode.com/browse/KNIFE-39

It seems to have been open for some time and I’m not sure if there has been any
discussion on a fix, at least from my Googling? Wondering if anyone has some
ideas that we can start airing out on the mailing list.

Thanks in advance!

Lance


#2

We use net-ssh and net-ssh-multi for making SSH connections. I haven’t
examined the code too deeply, but maybe we’re just not passing a
:timeout to the session constructor?

https://net-ssh.github.io/net-ssh/classes/Net/SSH.html

  • Julian

On Thu, Jun 26, 2014 at 2:02 PM, lbragstad@gmail.com wrote:

Hi everyone,

I wanted to get an idea if anyone had some input on how to go about fixing this
issue:

https://tickets.opscode.com/browse/KNIFE-39

It seems to have been open for some time and I’m not sure if there has been any
discussion on a fix, at least from my Googling? Wondering if anyone has some
ideas that we can start airing out on the mailing list.

Thanks in advance!

Lance


[ Julian C. Dunn jdunn@aquezada.com * Sorry, I’m ]
[ WWW: http://www.aquezada.com/staff/julian * only Web 1.0 ]
[ gopher://sdf.org/1/users/keymaker/ * compliant! ]
[ PGP: 91B3 7A9D 683C 7C16 715F 442C 6065 D533 FDC2 05B9 ]


#3

ideally it should tcp connect to port 22 (or whatever ssh port we’re
using) and use select() to see if ssh returns a banner reasonably fast.
then the problem with ssh timeouts is that one persons ‘long command
runaway command’ is another person’s ‘dammit! why did you kill my
process!’. this is then complicated by the need to ping check through
ssh gateways as well. a 15 minute timeout is most likely way too
aggressive for this. TCP or SSH protocol keepalives could probably be
turned on to drop connections to machines that crash hard and their
TCP/IP stack stops responding.

and i recently saw this code floating around the knife plugin codebase
somewhere associated with a PR that addressed this but now cannot find
it at all…

On Thu Jun 26 11:24:05 2014, Julian C. Dunn wrote:

We use net-ssh and net-ssh-multi for making SSH connections. I haven’t
examined the code too deeply, but maybe we’re just not passing a
:timeout to the session constructor?

https://net-ssh.github.io/net-ssh/classes/Net/SSH.html

  • Julian

On Thu, Jun 26, 2014 at 2:02 PM, lbragstad@gmail.com wrote:

Hi everyone,

I wanted to get an idea if anyone had some input on how to go about fixing this
issue:

https://tickets.opscode.com/browse/KNIFE-39

It seems to have been open for some time and I’m not sure if there has been any
discussion on a fix, at least from my Googling? Wondering if anyone has some
ideas that we can start airing out on the mailing list.

Thanks in advance!

Lance