It seems to have been open for some time and I’m not sure if there has been any
discussion on a fix, at least from my Googling? Wondering if anyone has some
ideas that we can start airing out on the mailing list.
We use net-ssh and net-ssh-multi for making SSH connections. I haven't
examined the code too deeply, but maybe we're just not passing a
:timeout to the session constructor?
It seems to have been open for some time and I'm not sure if there has been any
discussion on a fix, at least from my Googling? Wondering if anyone has some
ideas that we can start airing out on the mailing list.
ideally it should tcp connect to port 22 (or whatever ssh port we're
using) and use select() to see if ssh returns a banner reasonably fast.
then the problem with ssh timeouts is that one persons 'long command
runaway command' is another person's 'dammit! why did you kill my
process!'. this is then complicated by the need to ping check through
ssh gateways as well. a 15 minute timeout is most likely way too
aggressive for this. TCP or SSH protocol keepalives could probably be
turned on to drop connections to machines that crash hard and their
TCP/IP stack stops responding.
and i recently saw this code floating around the knife plugin codebase
somewhere associated with a PR that addressed this but now cannot find
it at all...
On Thu Jun 26 11:24:05 2014, Julian C. Dunn wrote:
We use net-ssh and net-ssh-multi for making SSH connections. I haven't
examined the code too deeply, but maybe we're just not passing a
:timeout to the session constructor?
It seems to have been open for some time and I'm not sure if there has been any
discussion on a fix, at least from my Googling? Wondering if anyone has some
ideas that we can start airing out on the mailing list.