Deployment and rollback of cookbooks, roles, environments and data bags


#1

Folks,

currently we have all chef stuff (cookbooks) in git repository.
Post-commit hook triggers jenkins job, which updates chef using slightly
modified version of chef-jenkins synchronization tool. Unfortunately
this approach allows submission of broken code, which would be revealed
only while pushing changes to chef-server. I want to modify this
approach by utilizing update hook and accepting changes only if they
would be successfully uploaded to chef server. But in order to do this I
also need some rollback mechanism on chef-server side to discard update
in case of any issues.

So my question is: do we have in chef server 11.08 any built in
mechanism for transactional updates or I need to implement some custom
solution?

Thanks,
Kirill.


#2

It seems to me that that’s likely where environment based versioning might
come in handy. Assuming the last run succeeded, you would want to freeze an
environment (let’s call it testing-old) at the last set of cookbooks
uploaded before you start uploading more (you could determine if the
previous run succeeded by checking whether or not a second environment
exists, testing). You then upload more. If any of them fail, you abort out
and throw an error (making sure the testing environment is gone). If you
succeed in uploading them all, then you create the testing environment,
freezing all versions at the new current. You then try these cookbooks out
in your cluster on testing nodes, and assuming things go fine, you can then
bless that version as dev/prod/whathaveyou.

Just because the cookbook uploaded doesn’t mean that everything’s
copacetic. It just means that the syntax passed. There’s a certain amount
of automatic testing that will help, but beyond that point, you simply need
to work with real world data to see whether or not the code is working
right. Automated testing rarely catches edge-case scenarios because the
people writing the testing usually don’t imagine them-- or they would have
been thought about when the code was being developed in the first place.


~~ StormeRider ~~

“Every world needs its heroes […] They inspire us to be better than we
are. And they protect from the darkness that’s just around the corner.”

(from Smallville Season 6x1: “Zod”)

On why I hate the phrase “that’s so lame”… http://bit.ly/Ps3uSS

On Mon, May 20, 2013 at 10:29 AM, Kirill Timofeev kvt@hulu.com wrote:

Folks,

currently we have all chef stuff (cookbooks) in git repository.
Post-commit hook triggers jenkins job, which updates chef using slightly
modified version of chef-jenkins synchronization tool. Unfortunately this
approach allows submission of broken code, which would be revealed only
while pushing changes to chef-server. I want to modify this approach by
utilizing update hook and accepting changes only if they would be
successfully uploaded to chef server. But in order to do this I also need
some rollback mechanism on chef-server side to discard update in case of
any issues.

So my question is: do we have in chef server 11.08 any built in mechanism
for transactional updates or I need to implement some custom solution?

Thanks,
Kirill.


#3

We have a setup like this:
Core cookbooks - core applications like ntp, monitoring, rsyslog and such
(maintained by chef operators)
Library cookbooks - base application cookbooks like httpd, tomcat, java and
such (maintained by chef operators)
Application cookbooks - tweaks and config for the library cookbooks
(maintained by application owners)

Then we use roles based on application cookbook + library cookbook that is
first branched out into a “feature branch” of our git dev branch. There we
test the role in a vagrant environment and see that everything is ok, the
one developing into the cookbook then submits a pull request to the chef
ops and he + 1 reviewer merges it into dev where its can be further tested
on a real staging server with the help of a chef solo run. If all checks
out ok its merged into master and pushed out to prod.

With the help of bamboo and the branching feature along with tags and
scripts the stage and prod parts are automated with the exception of a jira
status change =)
And so far we have had no bad code submitted into prod, with about 30
application cookbook commiters and 5 library and core cookbook comitters
internaly.
Dont know if its the best way but keeps our production servers running at
without any faults.

Jens Skott
Tel: +46-8-5142 4396
Schibsted Centralen IT

On Mon, May 20, 2013 at 8:09 PM, Morgan Blackthorne
stormerider@gmail.comwrote:

It seems to me that that’s likely where environment based versioning might
come in handy. Assuming the last run succeeded, you would want to freeze an
environment (let’s call it testing-old) at the last set of cookbooks
uploaded before you start uploading more (you could determine if the
previous run succeeded by checking whether or not a second environment
exists, testing). You then upload more. If any of them fail, you abort out
and throw an error (making sure the testing environment is gone). If you
succeed in uploading them all, then you create the testing environment,
freezing all versions at the new current. You then try these cookbooks out
in your cluster on testing nodes, and assuming things go fine, you can then
bless that version as dev/prod/whathaveyou.

Just because the cookbook uploaded doesn’t mean that everything’s
copacetic. It just means that the syntax passed. There’s a certain amount
of automatic testing that will help, but beyond that point, you simply need
to work with real world data to see whether or not the code is working
right. Automated testing rarely catches edge-case scenarios because the
people writing the testing usually don’t imagine them-- or they would have
been thought about when the code was being developed in the first place.


~~ StormeRider ~~

“Every world needs its heroes […] They inspire us to be better than we
are. And they protect from the darkness that’s just around the corner.”

(from Smallville Season 6x1: “Zod”)

On why I hate the phrase “that’s so lame”… http://bit.ly/Ps3uSS

On Mon, May 20, 2013 at 10:29 AM, Kirill Timofeev kvt@hulu.com wrote:

Folks,

currently we have all chef stuff (cookbooks) in git repository.
Post-commit hook triggers jenkins job, which updates chef using slightly
modified version of chef-jenkins synchronization tool. Unfortunately this
approach allows submission of broken code, which would be revealed only
while pushing changes to chef-server. I want to modify this approach by
utilizing update hook and accepting changes only if they would be
successfully uploaded to chef server. But in order to do this I also need
some rollback mechanism on chef-server side to discard update in case of
any issues.

So my question is: do we have in chef server 11.08 any built in mechanism
for transactional updates or I need to implement some custom solution?

Thanks,
Kirill.


#4

Thanks a lot for sharing your thoughts Jens and Morgan!

I tried to implement approach with testing and master branches in git
where all pushes to testing are automatically uploaded to testing chef
server and all pushes to master are automatically uploaded to production
chef server. This approach appeared to be inconvenient since it required
frequent merges between branches. Also if 2 people worked simultaneously
in testing and one of them was integrating his changes into master he
could accidentally merge changes of other person, which may not be ready
for production yet.

So I decided to switch to approach where chef development can be done in
feature branches, updates are delivered to testing chef server via knife
and when everything works as expected feature branch is being integrated
into master branch and changes automatically propagate to production
chef server. But I want to introduce protection against changes that
were not tested on testing chef server and to reject pushes that failed
to upload to production chef server. I understand, that this doesn’t
guarantee that successfully uploaded cook books have flawless logic.
Also I can’t rely on everybody using cookbook version locking, some
people just use latest versions.

Considering all above I need to implement rollback mechanism if upload
would fail. I can preserve old versions of roles and environments and
upload them again if something went wrong. For cookbooks this is even
easier. All updated cookbooks should have bumped versions so for
rollback I just need to delete specific cookbook version.

But it would be really great if I can update chef server in
transactional manner. By this I mean that I somehow tell chef server
"update started", upload changes and after that send command "update ok"
and those changes take effect simultaneously, or in case of “update
failed” command all changes are discarded. Do I understand correctly,
that currently chef server doesn’t support such use case?

Thanks,
Kirill.

On 05/20/2013 11:23 AM, Jens Skott wrote:

We have a setup like this:
Core cookbooks - core applications like ntp, monitoring, rsyslog and
such (maintained by chef operators)
Library cookbooks - base application cookbooks like httpd, tomcat,
java and such (maintained by chef operators)
Application cookbooks - tweaks and config for the library cookbooks
(maintained by application owners)

Then we use roles based on application cookbook + library cookbook
that is first branched out into a “feature branch” of our git dev
branch. There we test the role in a vagrant environment and see that
everything is ok, the one developing into the cookbook then submits a
pull request to the chef ops and he + 1 reviewer merges it into dev
where its can be further tested on a real staging server with the help
of a chef solo run. If all checks out ok its merged into master and
pushed out to prod.

With the help of bamboo and the branching feature along with tags and
scripts the stage and prod parts are automated with the exception of a
jira status change =)
And so far we have had no bad code submitted into prod, with about 30
application cookbook commiters and 5 library and core cookbook
comitters internaly.
Dont know if its the best way but keeps our production servers running
at without any faults.

Jens Skott
Tel: +46-8-5142 4396
Schibsted Centralen IT

On Mon, May 20, 2013 at 8:09 PM, Morgan Blackthorne
<stormerider@gmail.com mailto:stormerider@gmail.com> wrote:

It seems to me that that's likely where environment based
versioning might come in handy. Assuming the last run succeeded,
you would want to freeze an environment (let's call it
testing-old) at the last set of cookbooks uploaded before you
start uploading more (you could determine if the previous run
succeeded by checking whether or not a second environment exists,
testing). You then upload more. If any of them fail, you abort out
and throw an error (making sure the testing environment is gone).
If you succeed in uploading them all, then you create the testing
environment, freezing all versions at the new current. You then
try these cookbooks out in your cluster on testing nodes, and
assuming things go fine, you can then bless that version as
dev/prod/whathaveyou.

Just because the cookbook uploaded doesn't mean that everything's
copacetic. It just means that the syntax passed. There's a certain
amount of automatic testing that will help, but beyond that point,
you simply need to work with real world data to see whether or not
the code is working right. Automated testing rarely catches
edge-case scenarios because the people writing the testing usually
don't imagine them-- or they would have been thought about when
the code was being developed in the first place.

-- 
~*~ StormeRider ~*~

"Every world needs its heroes [...] They inspire us to be better
than we are. And they protect from the darkness that's just around
the corner."

(from Smallville Season 6x1: "Zod")

On why I hate the phrase "that's so lame"... http://bit.ly/Ps3uSS


On Mon, May 20, 2013 at 10:29 AM, Kirill Timofeev <kvt@hulu.com
<mailto:kvt@hulu.com>> wrote:

    Folks,

    currently we have all chef stuff (cookbooks) in git
    repository. Post-commit hook triggers jenkins job, which
    updates chef using slightly modified version of chef-jenkins
    synchronization tool. Unfortunately this approach allows
    submission of broken code, which would be revealed only while
    pushing changes to chef-server. I want to modify this approach
    by utilizing update hook and accepting changes only if they
    would be successfully uploaded to chef server. But in order to
    do this I also need some rollback mechanism on chef-server
    side to discard update in case of any issues.

    So my question is: do we have in chef server 11.08 any built
    in mechanism for transactional updates or I need to implement
    some custom solution?

    Thanks,
    Kirill.

#5

i agree that an ability to treat the entire cookbooks upload (along with
databags and roles) will be awesome, and simplify lot of tooling, i am not
sure how much effort it will take to do something like that …
we currently follow pretty much the same workflow as you have described,
but we do use a staging chef server. we also do a cleanse before the upload
(remove all cookbooks, databags etc), this ensures we are not carrying over
any artifacts, and that we can completely restore an older state (including
the clients). This is pretty much like implementing the transaction logic
at your end.

also we use jenkins , along with the ghprb plugin (github pull request
builder), which test individual PRs . We dont test all the feature
branches, only the PRs. the ghprb plugin lets you retest, bypass PRs using
comments to.
till now this is working, and solving the problem we wanted to address, but
its time consuming (chef zero might be helpful on this), but it also opened
up a can of other issues related to jenkins automation… but thats different
ranjib

On Mon, May 20, 2013 at 12:35 PM, Kirill Timofeev kvt@hulu.com wrote:

Thanks a lot for sharing your thoughts Jens and Morgan!

I tried to implement approach with testing and master branches in git
where all pushes to testing are automatically uploaded to testing chef
server and all pushes to master are automatically uploaded to production
chef server. This approach appeared to be inconvenient since it required
frequent merges between branches. Also if 2 people worked simultaneously in
testing and one of them was integrating his changes into master he could
accidentally merge changes of other person, which may not be ready for
production yet.

So I decided to switch to approach where chef development can be done in
feature branches, updates are delivered to testing chef server via knife
and when everything works as expected feature branch is being integrated
into master branch and changes automatically propagate to production chef
server. But I want to introduce protection against changes that were not
tested on testing chef server and to reject pushes that failed to upload to
production chef server. I understand, that this doesn’t guarantee that
successfully uploaded cook books have flawless logic. Also I can’t rely on
everybody using cookbook version locking, some people just use latest
versions.

Considering all above I need to implement rollback mechanism if upload
would fail. I can preserve old versions of roles and environments and
upload them again if something went wrong. For cookbooks this is even
easier. All updated cookbooks should have bumped versions so for rollback I
just need to delete specific cookbook version.

But it would be really great if I can update chef server in transactional
manner. By this I mean that I somehow tell chef server “update started”,
upload changes and after that send command “update ok” and those changes
take effect simultaneously, or in case of “update failed” command all
changes are discarded. Do I understand correctly, that currently chef
server doesn’t support such use case?

Thanks,
Kirill.

On 05/20/2013 11:23 AM, Jens Skott wrote:

We have a setup like this:
Core cookbooks - core applications like ntp, monitoring, rsyslog and such
(maintained by chef operators)
Library cookbooks - base application cookbooks like httpd, tomcat, java
and such (maintained by chef operators)
Application cookbooks - tweaks and config for the library cookbooks
(maintained by application owners)

Then we use roles based on application cookbook + library cookbook that
is first branched out into a “feature branch” of our git dev branch. There
we test the role in a vagrant environment and see that everything is ok,
the one developing into the cookbook then submits a pull request to the
chef ops and he + 1 reviewer merges it into dev where its can be further
tested on a real staging server with the help of a chef solo run. If all
checks out ok its merged into master and pushed out to prod.

With the help of bamboo and the branching feature along with tags and
scripts the stage and prod parts are automated with the exception of a jira
status change =)
And so far we have had no bad code submitted into prod, with about 30
application cookbook commiters and 5 library and core cookbook comitters
internaly.
Dont know if its the best way but keeps our production servers running at
without any faults.

Jens Skott
Tel: +46-8-5142 4396
Schibsted Centralen IT

On Mon, May 20, 2013 at 8:09 PM, Morgan Blackthorne <stormerider@gmail.com

wrote:

It seems to me that that’s likely where environment based versioning
might come in handy. Assuming the last run succeeded, you would want to
freeze an environment (let’s call it testing-old) at the last set of
cookbooks uploaded before you start uploading more (you could determine if
the previous run succeeded by checking whether or not a second environment
exists, testing). You then upload more. If any of them fail, you abort out
and throw an error (making sure the testing environment is gone). If you
succeed in uploading them all, then you create the testing environment,
freezing all versions at the new current. You then try these cookbooks out
in your cluster on testing nodes, and assuming things go fine, you can then
bless that version as dev/prod/whathaveyou.

Just because the cookbook uploaded doesn’t mean that everything’s
copacetic. It just means that the syntax passed. There’s a certain amount
of automatic testing that will help, but beyond that point, you simply need
to work with real world data to see whether or not the code is working
right. Automated testing rarely catches edge-case scenarios because the
people writing the testing usually don’t imagine them-- or they would have
been thought about when the code was being developed in the first place.


~~ StormeRider ~~

“Every world needs its heroes […] They inspire us to be better than we
are. And they protect from the darkness that’s just around the corner.”

(from Smallville Season 6x1: “Zod”)

On why I hate the phrase “that’s so lame”… http://bit.ly/Ps3uSS

On Mon, May 20, 2013 at 10:29 AM, Kirill Timofeev kvt@hulu.com wrote:

Folks,

currently we have all chef stuff (cookbooks) in git repository.
Post-commit hook triggers jenkins job, which updates chef using slightly
modified version of chef-jenkins synchronization tool. Unfortunately this
approach allows submission of broken code, which would be revealed only
while pushing changes to chef-server. I want to modify this approach by
utilizing update hook and accepting changes only if they would be
successfully uploaded to chef server. But in order to do this I also need
some rollback mechanism on chef-server side to discard update in case of
any issues.

So my question is: do we have in chef server 11.08 any built in
mechanism for transactional updates or I need to implement some custom
solution?

Thanks,
Kirill.


#6

Hi Ranjib,

may I ask you to give more details on your workflow? Do I understand
correctly that you have testing, staging and production chef servers and
deployment to production is just replicating staging chef server to
production chef server? By “cleanse before the upload” you mean that you
delete all old content of production chef server?

Thanks,
Kirill.

On 05/20/2013 12:47 PM, Ranjib Dey wrote:

i agree that an ability to treat the entire cookbooks upload (along
with databags and roles) will be awesome, and simplify lot of tooling,
i am not sure how much effort it will take to do something like that …
we currently follow pretty much the same workflow as you have
described, but we do use a staging chef server. we also do a cleanse
before the upload (remove all cookbooks, databags etc), this ensures
we are not carrying over any artifacts, and that we can completely
restore an older state (including the clients). This is pretty much
like implementing the transaction logic at your end.

also we use jenkins , along with the ghprb plugin (github pull request
builder), which test individual PRs . We dont test all the feature
branches, only the PRs. the ghprb plugin lets you retest, bypass PRs
using comments to.
till now this is working, and solving the problem we wanted to
address, but its time consuming (chef zero might be helpful on this),
but it also opened up a can of other issues related to jenkins
automation… but thats different
ranjib

On Mon, May 20, 2013 at 12:35 PM, Kirill Timofeev <kvt@hulu.com
mailto:kvt@hulu.com> wrote:

Thanks a lot for sharing your thoughts Jens and Morgan!

I tried to implement approach with testing and master branches in
git where all pushes to testing are automatically uploaded to
testing chef server and all pushes to master are automatically
uploaded to production chef server. This approach appeared to be
inconvenient since it required frequent merges between branches.
Also if 2 people worked simultaneously in testing and one of them
was integrating his changes into master he could accidentally
merge changes of other person, which may not be ready for
production yet.

So I decided to switch to approach where chef development can be
done in feature branches, updates are delivered to testing chef
server via knife and when everything works as expected feature
branch is being integrated into master branch and changes
automatically propagate to production chef server. But I want to
introduce protection against changes that were not tested on
testing chef server and to reject pushes that failed to upload to
production chef server. I understand, that this doesn't guarantee
that successfully uploaded cook books have flawless logic. Also I
can't rely on everybody using cookbook version locking, some
people just use latest versions.

Considering all above I need to implement rollback mechanism if
upload would fail. I can preserve old versions of roles and
environments and upload them again if something went wrong. For
cookbooks this is even easier. All updated cookbooks should have
bumped versions so for rollback I just need to delete specific
cookbook version.

But it would be really great if I can update chef server in
transactional manner. By this I mean that I somehow tell chef
server "update started", upload changes and after that send
command "update ok" and those changes take effect simultaneously,
or in case of "update failed" command all changes are discarded.
Do I understand correctly, that currently chef server doesn't
support such use case?

Thanks,
Kirill.

On 05/20/2013 11:23 AM, Jens Skott wrote:
We have a setup like this:
Core cookbooks - core applications like ntp, monitoring, rsyslog
and such (maintained by chef operators)
Library cookbooks - base application cookbooks like httpd,
tomcat, java and such (maintained by chef operators)
Application cookbooks - tweaks and config for the library
cookbooks (maintained by application owners)

Then we use roles based on application cookbook + library
cookbook that is first branched out into a "feature branch" of
our git dev branch. There we test the role in a vagrant
environment and see that everything is ok, the one developing
into the cookbook then submits a pull request to the chef ops and
he + 1 reviewer merges it into dev where its can be further
tested on a real staging server with the help of a chef solo run.
If all checks out ok its merged into master and pushed out to prod.

With the help of bamboo and the branching feature along with tags
and scripts the stage and prod parts are automated with the
exception of a jira status change =)
And so far we have had no bad code submitted into prod, with
about 30 application cookbook commiters and 5 library and core
cookbook comitters internaly.
Dont know if its the best way but keeps our production servers
running at without any faults.


Jens Skott
Tel: +46-8-5142 4396 <tel:%2B46-8-5142%204396>
*Schibsted Centralen IT*



On Mon, May 20, 2013 at 8:09 PM, Morgan Blackthorne
<stormerider@gmail.com <mailto:stormerider@gmail.com>> wrote:

    It seems to me that that's likely where environment based
    versioning might come in handy. Assuming the last run
    succeeded, you would want to freeze an environment (let's
    call it testing-old) at the last set of cookbooks uploaded
    before you start uploading more (you could determine if the
    previous run succeeded by checking whether or not a second
    environment exists, testing). You then upload more. If any of
    them fail, you abort out and throw an error (making sure the
    testing environment is gone). If you succeed in uploading
    them all, then you create the testing environment, freezing
    all versions at the new current. You then try these cookbooks
    out in your cluster on testing nodes, and assuming things go
    fine, you can then bless that version as dev/prod/whathaveyou.

    Just because the cookbook uploaded doesn't mean that
    everything's copacetic. It just means that the syntax passed.
    There's a certain amount of automatic testing that will help,
    but beyond that point, you simply need to work with real
    world data to see whether or not the code is working right.
    Automated testing rarely catches edge-case scenarios because
    the people writing the testing usually don't imagine them--
    or they would have been thought about when the code was being
    developed in the first place.

    -- 
    ~*~ StormeRider ~*~

    "Every world needs its heroes [...] They inspire us to be
    better than we are. And they protect from the darkness that's
    just around the corner."

    (from Smallville Season 6x1: "Zod")

    On why I hate the phrase "that's so lame"... http://bit.ly/Ps3uSS


    On Mon, May 20, 2013 at 10:29 AM, Kirill Timofeev
    <kvt@hulu.com <mailto:kvt@hulu.com>> wrote:

        Folks,

        currently we have all chef stuff (cookbooks) in git
        repository. Post-commit hook triggers jenkins job, which
        updates chef using slightly modified version of
        chef-jenkins synchronization tool. Unfortunately this
        approach allows submission of broken code, which would be
        revealed only while pushing changes to chef-server. I
        want to modify this approach by utilizing update hook and
        accepting changes only if they would be successfully
        uploaded to chef server. But in order to do this I also
        need some rollback mechanism on chef-server side to
        discard update in case of any issues.

        So my question is: do we have in chef server 11.08 any
        built in mechanism for transactional updates or I need to
        implement some custom solution?

        Thanks,
        Kirill.