Kitchen, Train, and Beyond


#1

As part of some Bloomberg-sponsored work on Test Kitchen plugins for Solaris and AIX I’ve been working on shoring up TK’s internals. Part of this requires improving Test Kitchen’s transport abstraction (SSH, WinRM, etc) so I can do stuff like concurrent connections or reading command output. Relatedly we also now have Train floating around in our general sphere of influence, which is also a transport abstraction, but also needs a lot of improvements and cleanup.

The simplest solution on my end would be to just fix what already exists in TK but I feel like the long term solution should involve unifying all of these. Would the Train team be amenable to some major rewrites on that side, or would I be better off building a new library that could be both a Train and TK plugin? Thoughts?

–Noah


#2

yeah the multiple transport abstractions is a major pain point and something that really needs to be refactored and centralized. Train is the obvious location for this centralization to take place but its currently very inspec centric. I’ll let the train/inspec team speak for their level of PR comfort but I think you are looking in the right direction.


#3

I think the best course forward is to “pull a Test Kitchen” and basically start over on Train, but I would totally understand if that is too disruptive for the InSpec/Compliance team though. Starting under a new name would be the same end result, just one or two more things to deprecate and port off of in the process.


#4

Hi coderanger!

At first of all, I like the idea to use train under the cover for test-kitchen. The move makes sense from my personal perspective. Since train is already a merge of the test-kitchen connection implementation and the original InSpec connection implementation, I am wondering what would require a major rewrite in train? I would like to understand the impact you are talking about in more detail. What would your target architecture look like?

My naive iterative approach would be to rewrite the transports https://github.com/test-kitchen/test-kitchen/tree/master/lib/kitchen/transport ssh and winrm to use train transports first.

-Chris


#5

Just a +1 for the feature of having two-way transfer or at least reading
command output. I wish I had that so I could implement pulling test
reports (eg, jUnit-format XML files or TAP output) from the target VM.
Fletcher Nichol and I batted the idea around a bit at Chef Summit 2015, but
I was too daunted by the notion of overhauling the transport.


#6

I think if we’re talking about making something that will fix all the various use cases we have the following API requirements:

First, command execution. This should probably match the existing shell_out[!] convention, though with fewer options. Currently TK raises an error on failed commands while Train does not, and neither offers a way to get the other behavior. We also need to add array commands, better sudo handling (TK has none, Train’s is unsafe if you give a password), and deal with timeouts. Train seems to have copy-pasta’d TK’s core SSH loop (or they both got it from the same example code) and both have issues with multiple commands executing in parallel on one connection. Basically both sides need their SSH implementations torn down to brass tacks.

Second, file operations. TK and Train both have a patchwork of code dealing with this but it’s currently kind of a mess. I think the list of operations we need is:

  1. given a string buffer, upload a file.
  2. download a file to a string buffer.
  3. given a local path, stream upload a file.
  4. stream download a file to a local path.
  5. stat a remote file.
  6. sync a local folder to the remote side (i.e. recursive upload but with rsync-y semantics)
  7. listdir a remote path (maybe?)

Basically on the file operations side I think we should try to extend the VFS model that Train has, but it needs to be hugely refactored to support this. Notably Train’s VFS implementation is super slow due to running OS probe commands over and over.

Does that sound like the right set of things to fix?


#7

I started playing with API ideas today and came up with this if anyone wants to take a look. Given the level of departure from both TK and Train I think the best course is to work on this independently and then later swap/deprecate Train and Kitchen::Transport. Does that look like it would address the needs of everyone that is doing local/ssh/winrm file operations?


#8

@coderanger The API includes very good improvements regarding file upload / download. I am happy to support you to get the file upload feature into train. I strongly advice against doing another implementation though. We should implement the features gradually. So file upload/download is a feature that could be easily added to the existing train file implementation as well. Also, I would like to understand, how you are going to implement the file sync mechanism.


#9

I don’t think the APIs in Train are worth preserving. The distinction between transport and connection should be removed, and I think Train put too much platform-specific logic in to the File classes when it belongs elsewhere (probably the connection). Add to that the Train has multiple APIs that should be dropped (file hashes have only the base implementation, and something about product version that seems totally unimplemented with no comments). Also the whole name “Train” is confusing as it doesn’t match the name of the gem (r-train). All adds up to probably being worth a copy-pasta, or at least like we did with jamie vs. Test Kitchen.

As for file sync, the naive implementation used in TK now is recursive delete + recursive file transfer, but I’ve got some improved versions in the kitchen-sync project.


#10

@coderanger Could you please explain in more detail, how your sync implementation works platform-independent e.g. support for Windows and/or network devices? I think we cannot assume that we have sftp or rsync available on all machines. Would you still need the fallback to the current implementation?


#11

While interesting, the sync handling isn’t really relevant to this :slight_smile:

I’ve started on API design over at https://github.com/poise/airlift and what I have so far looks a bit like this:

conn = Airlift.connect(name: 'ssh', hostname: 'example.com')
cmd = conn.execute!('echo foo')
# cmd looks like a Mixlib::ShellOut object.
puts cmd.stdout
f = conn.file('/etc/motd')
puts f.content
# f.stat looks like a File::Stat object.
puts f.stat.file?

How does that look to everyone? Trying to cut down on the conceptual overhead a bit by reusing API shapes from elsewhere.


#12

Hi @coderanger, as mentioned above, I really like to general idea to bring kitchen and train together. And I fully agree that we need file upload / download functionality in train. To achieve the goal, I would like to focus on the specific improvements required in trian and test-kitchen. My top priorities would be:

  1. add file upload/download feature to train
  2. use train as transport layer in kitchen

What is holding you back from integrating the improvements in those projects?


#13

The short version is that I think Train has fundamentally weaker abstractions. Having two different concerts for “transport instance” vs. “connection instance” adds to the conceptual overhead for minimal gain, and Train’s File abstraction/class got really muddy with a mix of different functionality in different classes (eg. AixFile being its own thing), features that should be in InSpec instead of Train (eg. all the stuff related to filesystem mounts and content hashing, all the foo? helper methods), and generally having a weird mismash of features. The code organization also seems like it was built organically in a few discrete bursts (eg. LocalFile is under transports/ where all other file clases are under an unhelpfully named extras/ module). I could certainly copy-paste everything from Airlift in to Train, but I don’t see the value of that over just making both InSpec and Kitchen use the new APIs and deprecating Train. Also notable: Train’s project name and gem name do not match, which has caused me quite a few “oh crud” moments over the past few months. Put all of those together and a rename for the new API seems like a good idea. I’m not starting over scratch, if you look at the existing SSH implementation it’s largely copy-pasta from both Train and Kitchen. Most of Train’s WinRM implementation should probably be ripped out in favor of Matt’s newer winrm-elevated and winrm-fs gems so that pretty much covers that side. I made a stub for Docker connection plugin because I know that’s something Train supports but I still need to dive in to how precisely it is using the Docker gem to implement all that.


#14

So I totally applaud the effort to tighten up a transport abstraction that can work across different applications. It’s been a long time frustration of mine that we have several key applications that all tackle the hard problem of getting different computers to talk to one another but do it in different ways or even worse do it pretty much the same way but just different code bases. As a WinRM gem maintainer this sometimes means I have to submit several PRs if i want to see a particular enhancement make its way into these different products.

However, I’d really hope we could do this without creating yet another abstraction. Sure we can learn from our other attempts and make the new one even better but once you start getting into integration, I think we’d all agree that gets messy in the best of code bases. I think Train is the best candidate gem to be the home of this effort. That really is the ultimate objective of that gem. There are also others beyond ourselves who I think are passionate about this effort and would be willing to collaborate and provide feedback.

Train has problems. Its development has been mainly to support one application (inspec) and thus suffers some bias. Maybe we need to take on a major version change to accomplish this effort and break compatibility in several areas (or not). However as you have noted, much of trains implementation is “copy/pasted” from Test-Kitchen so getting those two to work together is a natural and iterative progression from where we are today. I really don’t think you will find much opposition to committing major refactorings to train. The initial steps may be uglier than you like but not unworkable.

In the end whatever it is we come up with will need to be maintained by all of us so there are many I think with a vested interested in making this work. The closer we get to a single transport gem that runs through Inspec, Test-Kitchen, bootstrap plugins, provisioning and perhaps other ruby based tools, the better this world will be to code in.

Matt


#15

My reason for saying we should change the name instead of “pulling a Test Kitchen” and releasing a major rev with massive changes is mostly because the gem name is not “train” and that’s super gross. I think that’s a good enough reason to change names such that the project and gem name will match, saving on potential confusion when gem install train doesn’t seem to work.

If for some reason everyone else is okay with this mismatch being permanent, I am happy to basically copy-paste all the refactoring work in to Train and call it 1.0 or 2.0.


#16

I am very excited to report that train is now available as train gem. Therefore gem install train is working as expected now!