Chef Client 12.1.0 Post-Mortem

At yesterday's IRC meeting we agreed we should have a post-mortem for
the 12.1.0 release since it had so many regressions (we're still
working on releasing all the fixes). We want to have it over a
video-conferencing software since post-mortem's can feel really
personal and it's more humane when we can see each other. I went with
Zoom over Google Hangouts because it's easiest for me to schedule a
Zoom meeting because we use it at Chef.

All the information I've collected is on chef#3107, as well as the
zoom meeting information. Please comment on chef#3107 if there's
anything you'd like added to the report at the top of the issue.

Tuesday 3/24 at 9am PDT / Noon EDT / 1600 UTC

--
Bryan McLellan | chef | engineering lead
(c) 206.607.7108 | (t) @btmspox | (www) http://chef.io

No bikeshedding intended, but that's a great way to rule out Linux/BSD
users. Maybe that's a vanishingly small number, but it does include me. I
guess there is an Android client, though.

Perhaps a community standard should be established around accessibility?
That would seem like a Good Thing for an avowedly open source company to
do. Hangouts work (for some value of work!) on everything...

Just my €0.02

On Fri, Mar 20, 2015 at 6:29 PM, Bryan McLellan btm@chef.io wrote:

At yesterday's IRC meeting we agreed we should have a post-mortem for
the 12.1.0 release since it had so many regressions (we're still
working on releasing all the fixes). We want to have it over a
video-conferencing software since post-mortem's can feel really
personal and it's more humane when we can see each other. I went with
Zoom over Google Hangouts because it's easiest for me to schedule a
Zoom meeting because we use it at Chef.

All the information I've collected is on chef#3107, as well as the
zoom meeting information. Please comment on chef#3107 if there's
anything you'd like added to the report at the top of the issue.

Tuesday 3/24 at 9am PDT / Noon EDT / 1600 UTC

12.1.0 Release Regressions · Issue #3107 · chef/chef · GitHub

--
Bryan McLellan | chef | engineering lead
(c) 206.607.7108 | (t) @btmspox | (www) http://chef.io

On Fri, Mar 20, 2015 at 4:22 PM, Angus Buchanan angus.o.buchanan@gmail.com
wrote:

Perhaps a community standard should be established around accessibility?
That would seem like a Good Thing for an avowedly open source company to
do. Hangouts work (for some value of work!) on everything...

Unfortunately, easy multi-platform video conferencing for 25 people is
still a unicorn. Hangouts only work for 10 participants, 15 if you jump
through some complex hoops with Google Apps For Work. Yes, you can do
Hangouts On Air, which we still use in some cases, but we're looking for a
larger number of potential participants in this case since we've usually
got 10 people just from Chef in our internal post-mortems. So I fully
admit, it's a compromise based on our experience with different tools and
meetings.

Bryan

Sorry for the plug, but my company happens to offer large, multi-platform
video conferences: http://bluejeans.com. Up to 25 participants standard,
100 with the large meeting feature; the browser client works on Mac, Linux,
and Windows.

On Sun, Mar 22, 2015 at 7:55 PM, Bryan McLellan btm@loftninjas.org wrote:

On Fri, Mar 20, 2015 at 4:22 PM, Angus Buchanan <
angus.o.buchanan@gmail.com> wrote:

Perhaps a community standard should be established around accessibility?
That would seem like a Good Thing for an avowedly open source company to
do. Hangouts work (for some value of work!) on everything...

Unfortunately, easy multi-platform video conferencing for 25 people is
still a unicorn. Hangouts only work for 10 participants, 15 if you jump
through some complex hoops with Google Apps For Work. Yes, you can do
Hangouts On Air, which we still use in some cases, but we're looking for a
larger number of potential participants in this case since we've usually
got 10 people just from Chef in our internal post-mortems. So I fully
admit, it's a compromise based on our experience with different tools and
meetings.

Bryan

+1 on Bluejeans. We use it at Zendesk and I'm very pleased with its
performance and usability.

On Mon, Mar 23, 2015 at 4:33 AM, Sean Clemmer sclemmer@bluejeansnet.com
wrote:

Sorry for the plug, but my company happens to offer large, multi-platform
video conferences: http://bluejeans.com. Up to 25 participants standard,
100 with the large meeting feature; the browser client works on Mac, Linux,
and Windows.

On Sun, Mar 22, 2015 at 7:55 PM, Bryan McLellan btm@loftninjas.org
wrote:

On Fri, Mar 20, 2015 at 4:22 PM, Angus Buchanan <
angus.o.buchanan@gmail.com> wrote:

Perhaps a community standard should be established around accessibility?
That would seem like a Good Thing for an avowedly open source company to
do. Hangouts work (for some value of work!) on everything...

Unfortunately, easy multi-platform video conferencing for 25 people is
still a unicorn. Hangouts only work for 10 participants, 15 if you jump
through some complex hoops with Google Apps For Work. Yes, you can do
Hangouts On Air, which we still use in some cases, but we're looking for a
larger number of potential participants in this case since we've usually
got 10 people just from Chef in our internal post-mortems. So I fully
admit, it's a compromise based on our experience with different tools and
meetings.

Bryan

+1 . same here,

On Mon, Mar 23, 2015 at 3:10 AM, Michael Fischer mfischer@zendesk.com
wrote:

+1 on Bluejeans. We use it at Zendesk and I'm very pleased with its
performance and usability.

On Mon, Mar 23, 2015 at 4:33 AM, Sean Clemmer sclemmer@bluejeansnet.com
wrote:

Sorry for the plug, but my company happens to offer large, multi-platform
video conferences: http://bluejeans.com. Up to 25 participants standard,
100 with the large meeting feature; the browser client works on Mac, Linux,
and Windows.

On Sun, Mar 22, 2015 at 7:55 PM, Bryan McLellan btm@loftninjas.org
wrote:

On Fri, Mar 20, 2015 at 4:22 PM, Angus Buchanan <
angus.o.buchanan@gmail.com> wrote:

Perhaps a community standard should be established around
accessibility? That would seem like a Good Thing for an avowedly open
source company to do. Hangouts work (for some value of work!) on
everything...

Unfortunately, easy multi-platform video conferencing for 25 people is
still a unicorn. Hangouts only work for 10 participants, 15 if you jump
through some complex hoops with Google Apps For Work. Yes, you can do
Hangouts On Air, which we still use in some cases, but we're looking for a
larger number of potential participants in this case since we've usually
got 10 people just from Chef in our internal post-mortems. So I fully
admit, it's a compromise based on our experience with different tools and
meetings.

Bryan

We had the public post-mortem as planned on Tuesday morning. I didn’t
think to screen capture the attendee list in Zoom, but I’d say there
were about 25 of us, mostly Chef employees.

I’ve completed the report with the immediate corrective actions we
identified here: https://gist.github.com/btm/641d3b0ec331ac34fbe9.
I’ve also cleaned up the Github Issue and we’ve been using it to track
regressions and fixes: https://github.com/chef/chef/issues/3107

If you’d like to watch the recording of the hour-long meeting, it will
be available here when it’s done processing:

The biggest difficulty for us was in the meeting avoiding discussing
and designing an ideal test infrastructure, rather than focusing on
what immediate corrective actions we could work on in the next couple
weeks. The available time of all is of course further limited by
ChefConf coming right up. One of the corrective actions was to hold an
Open Space or BoF (or both!) at ChefConf to continue the discussion.
At Chef, we’ve been using buildkite for some projects so we’re going
to look into using it to run some integration tests for Chef, which
would be more visible to everyone, like Travis and Appveyor. We
currently trigger testing on a wider number of platforms when we make
builds using Jenkins, and soon hope to have these automatically happen
with a git trigger. However, the results of those aren’t visible nor
are the triggered on PRs. So we’re hopeful about buildkite.

We’ve got another release coming out soon, within a week. I think
we’re “over the hump” with fixing 12.1.0 regressions and we’ll slow
down with releases a little bit to go back to finishing up the new
Chef Client build cluster (Manhattan) which should make it easier to
release builds at a higher cadence.

I’d personally like to some discussion from contributors and
maintainers about what kind of testing we should be doing and when.
Should contributors be manually testing their code on multiple
platforms? Should maintainers? Should Chef when we release? Having a
huge test matrix of integration tests will be great, but we’ve all got
to write them. Should we be running common cookbooks, or specific
cookbooks in the chef repository that have high code coverage, e.g.
not just install a package, but test every action with a matrix of
attributes like source and version? How about both?

We’ve done pretty good a culturally agreeing we need tests with
regression fixes and new features, since we’ve never really built an
automated integration testing framework for Chef we’ve relied a lot on
manual testing. How much of that we do has varied greatly over time
and I think still varies from contributor to contributor quite a bit.
I think it would be helpful if we found a baseline and wrote it up.


Bryan McLellan | chef | engineering lead
© 206.607.7108 | (t) @btmspox | (www) http://chef.io