On the general usage of chef


#1

it seems like the ideal case of using chef, would be to run it as a cronjob
every day on the servers, to keep the servers in their proper and perfect
configuration.

In practice, we are currently not doing this. there is more than just one
simple reason for it.

  • the recipes would need to always be in a perfect and functional state.
    let’s say that you were experimenting with something. and then every single
    server in your environment grabs that recipe and runs it. this could affect
    the entire system, every single server.

  • even if you had two separate chef servers, one for production and one for
    testing… so that the production server was designed to always be correct.
    even then, there is some small chance for human error, and it would propagate
    automatically and without necessarily being watched at that moment, to every
    single server in the environment.

  • you might have recipes that are designed to handle 95% of the tasks, but
    there is still a bit of human intervention, or the developers are still logging
    into the servers and tweaking things . an automated chef run might cause a
    conflict between the recipe and the manually added changes.

  • the recipes are in a state of flux, you might apply them carefully and
    methodically to only a few servers at a time.

thoughts on this topic?


#2

ok, I see this was all already discussed yesterday! nevermind then.

I suppose my point is in relation to that comment/quote in the last
thread which went “to err is human, to propagate your error to 1000
machines automatically is devops.” if you are a cookbook author,
and you are designing complex cookbooks, and changing them often,
and then… it has to be kept in mind.

On Tue, Jan 14, 2014 at 1:02 PM, Sam Darwin samuel.d.darwin@gmail.com wrote:

it seems like the ideal case of using chef, would be to run it as a cronjob
every day on the servers, to keep the servers in their proper and perfect
configuration.

In practice, we are currently not doing this. there is more than just one
simple reason for it.

  • the recipes would need to always be in a perfect and functional state.
    let’s say that you were experimenting with something. and then every single
    server in your environment grabs that recipe and runs it. this could affect
    the entire system, every single server.

  • even if you had two separate chef servers, one for production and one for
    testing… so that the production server was designed to always be correct.
    even then, there is some small chance for human error, and it would propagate
    automatically and without necessarily being watched at that moment, to every
    single server in the environment.

  • you might have recipes that are designed to handle 95% of the tasks, but
    there is still a bit of human intervention, or the developers are still logging
    into the servers and tweaking things . an automated chef run might cause a
    conflict between the recipe and the manually added changes.

  • the recipes are in a state of flux, you might apply them carefully and
    methodically to only a few servers at a time.

thoughts on this topic?


#3

On Jan 14, 2014, at 6:02 AM, Sam Darwin samuel.d.darwin@gmail.com wrote:

  • the recipes would need to always be in a perfect and functional state.
    let’s say that you were experimenting with something. and then every single
    server in your environment grabs that recipe and runs it. this could affect
    the entire system, every single server.

Don’t do this experimentation on your production environment. That should be locked down seven ways from Sunday. Even if a new version of a cookbook were to get pushed out, it wouldn’t get used on any production servers until the production environment was updated to allow that newer version.

Do this experimentation in your Development environment, and once it’s rock solid, then you push to Staging. From there, once it’s baked in for a while, you can push to Prod.

And you can do all of this on a single Chef server.

  • even if you had two separate chef servers, one for production and one for
    testing… so that the production server was designed to always be correct.
    even then, there is some small chance for human error, and it would propagate
    automatically and without necessarily being watched at that moment, to every
    single server in the environment.

Human error is always a possibility. If you do your job right, with the right unit and integration tests in your code, appropriate acceptance and smoke testing in a suitable “Dev” or “Lab” environment, then the biggest risk is the humans who are pushing the buttons.

  • you might have recipes that are designed to handle 95% of the tasks, but
    there is still a bit of human intervention, or the developers are still logging
    into the servers and tweaking things . an automated chef run might cause a
    conflict between the recipe and the manually added changes.

Don’t let developers log into production servers. In fact, don’t let your admin staff log into production servers. Only automated systems should be doing anything on them – if a human being has to log in, then that machine should be marked as down or broken and pulled out of production.

  • the recipes are in a state of flux, you might apply them carefully and
    methodically to only a few servers at a time.

Recipes as they are being developed should be in a state of frequent flux, but recipes as they are pushed out to the Prod environment should be locked down within an inch of their life.


Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu


#4

I agree with these comments. I would also add that the best practice in production (aka ops) would require every change, regardless if implemented via manual human, manual runs of chef-client, or automated runs of chef-client, should be associated with a change record and change board approval. With that said, if you automate chef-client’s execution through cron or as a process, prior to every execution, the changes that would be pushed with the next run need to have a CR and need to have been approved by the board.

So, if you can get that succinct process running like a tight ship, then automating the runs would be the way to go.

Sent from my iPhone

On Jan 14, 2014, at 9:32 AM, Brad Knowles brad@shub-internet.org wrote:

On Jan 14, 2014, at 6:02 AM, Sam Darwin samuel.d.darwin@gmail.com wrote:

  • the recipes would need to always be in a perfect and functional state.
    let’s say that you were experimenting with something. and then every single
    server in your environment grabs that recipe and runs it. this could affect
    the entire system, every single server.

Don’t do this experimentation on your production environment. That should be locked down seven ways from Sunday. Even if a new version of a cookbook were to get pushed out, it wouldn’t get used on any production servers until the production environment was updated to allow that newer version.

Do this experimentation in your Development environment, and once it’s rock solid, then you push to Staging. From there, once it’s baked in for a while, you can push to Prod.

And you can do all of this on a single Chef server.

  • even if you had two separate chef servers, one for production and one for
    testing… so that the production server was designed to always be correct.
    even then, there is some small chance for human error, and it would propagate
    automatically and without necessarily being watched at that moment, to every
    single server in the environment.

Human error is always a possibility. If you do your job right, with the right unit and integration tests in your code, appropriate acceptance and smoke testing in a suitable “Dev” or “Lab” environment, then the biggest risk is the humans who are pushing the buttons.

  • you might have recipes that are designed to handle 95% of the tasks, but
    there is still a bit of human intervention, or the developers are still logging
    into the servers and tweaking things . an automated chef run might cause a
    conflict between the recipe and the manually added changes.

Don’t let developers log into production servers. In fact, don’t let your admin staff log into production servers. Only automated systems should be doing anything on them – if a human being has to log in, then that machine should be marked as down or broken and pulled out of production.

  • the recipes are in a state of flux, you might apply them carefully and
    methodically to only a few servers at a time.

Recipes as they are being developed should be in a state of frequent flux, but recipes as they are pushed out to the Prod environment should be locked down within an inch of their life.


Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu


#5

On Tue, Jan 14, 2014 at 7:02 AM, Sam Darwin samuel.d.darwin@gmail.comwrote:

it seems like the ideal case of using chef, would be to run it as a cronjob
every day on the servers, to keep the servers in their proper and perfect
configuration.

In practice, we are currently not doing this. there is more than just one
simple reason for it.

  • the recipes would need to always be in a perfect and functional state.
    let’s say that you were experimenting with something. and then every
    single
    server in your environment grabs that recipe and runs it. this could
    affect
    the entire system, every single server.

If you pin recipes (directly in run_lists or indirectly in environments or
roles), you can upload new versions of recipes without fear of having them
propagate to unintended servers. An operational pattern I’ve used in the
past is to have dev servers with versions that aren’t pinned (so they get
the latest version of whatever cookbooks are out there) and have every
other type of server (integration, reference, production, etc) have
explicitly pinned versions. This allows “testing” and “development” of
cookbooks in the same chef organization/server (insuring you’re don’t have
any “Dev is different than ref is different than int is different than
prod” issues) without concern that new cookbook versions will affect
production nodes (or any other ones, for that matter).

Two things you’ll want (or need) to do if you put this into practice:

  1. Verify your pinning all the way down for non-dynamic cookbooks. We
    distribute user information (logins, ssh keys, sudoers, etc) via a
    generated cookbook. The version of this cookbook will change every time
    it’s newly generated. Since we want to use the most recent version of this
    cookbook every time, we do not do any pinning for this cookbook. But for
    cookbooks that are not dynamic, everything gets pinned explicitly or via
    dependent pinning. So, the platform cookbook explicitly pins apt, bash,
    system ruby, etc; nginx cookbook explicitly pins the lua cookbook and any
    module cookbooks it depends on; tomcat cookbook explicitly pins the java
    cookbook version; and so on. Make sure it’s pinned turtles all the way
    down. It’s a pain if you have to retrofit, but worth it IMO.
  2. Every time you upload a cookbook that will be pinned, it is frozen.
    Every time, no exceptions. What you don’t want, after going through all
    the trouble of making sure you know exactly what version is going to go out
    on a specific type of node, only to have that version change.

Still doesn’t mean you can’t propagate an error to many machines, but with
additional gates in place, it makes it more difficult for this to happen
unintentionally.


#6

Yeah, the answer is to use environments and to explicitly version pin
all your cookbooks. I’m not sure I’d suggest going down the road of
explicitly version pinning in your cookbook deps, but instead your
promotion from integration to production (or however you have your
environments setup) should be to copy the exact version pins in the
integration environment file and push them into the production one.
Ideally you have a full CD pipeline where you start with "devops"ers
working on their laptops doing local changes to cookbooks and using
test-kitchen to validate them against virtual images. Then those
changes are published and made available as a cookbook version on the
chef server, and a jenkins environment that floats on cookbook versions
picks them up and runs test-kitchen against them, and if they pass
they’re promoted to a full-sized integration environment where again
there’s a test suite that validates that the changes work. If you
swallow the whole koolaid then you will eventually do a push to
production with any CRB approval since you’ve proved the change works
via multiple testing steps in your different environments.

You don’t even really need semantic versioning in this system, you just
need to keep cookbook versions unique and frozen. The set that gets
deployed is the set that was tested together and works, you don’t really
want any version constraints in your cookbooks at all, you just test
latest against latest and if its good you ship it.

The same basic workflow also works if you rip out all the CD, but then
you’ve got manual Q/A process and humans pressing buttons to do the
environment promotion, and a CRB involved at some point before it goes
to prod.

On 1/14/14 8:15 AM, Dylan Northrup wrote:

On Tue, Jan 14, 2014 at 7:02 AM, Sam Darwin <samuel.d.darwin@gmail.com
mailto:samuel.d.darwin@gmail.com> wrote:

it seems like the ideal case of using chef, would be to run it as
a cronjob
every day on the servers, to keep the servers in their proper and
perfect
configuration.

In practice, we are currently not doing this.   there is more than
just one
simple reason for it.

- the recipes would need to always be in a perfect and functional
state.
let's say that you were experimenting with something. and then
every single
server in your environment grabs that recipe and runs it.    this
could affect
the entire system, every single server.

If you pin recipes (directly in run_lists or indirectly in
environments or roles), you can upload new versions of recipes without
fear of having them propagate to unintended servers. An operational
pattern I’ve used in the past is to have dev servers with versions
that aren’t pinned (so they get the latest version of whatever
cookbooks are out there) and have every other type of server
(integration, reference, production, etc) have explicitly pinned
versions. This allows “testing” and “development” of cookbooks in the
same chef organization/server (insuring you’re don’t have any "Dev is
different than ref is different than int is different than prod"
issues) without concern that new cookbook versions will affect
production nodes (or any other ones, for that matter).

Two things you’ll want (or need) to do if you put this into practice:

  1. Verify your pinning all the way down for non-dynamic cookbooks. We
    distribute user information (logins, ssh keys, sudoers, etc) via a
    generated cookbook. The version of this cookbook will change every
    time it’s newly generated. Since we want to use the most recent
    version of this cookbook every time, we do not do any pinning for this
    cookbook. But for cookbooks that are not dynamic, everything gets
    pinned explicitly or via dependent pinning. So, the platform cookbook
    explicitly pins apt, bash, system ruby, etc; nginx cookbook explicitly
    pins the lua cookbook and any module cookbooks it depends on; tomcat
    cookbook explicitly pins the java cookbook version; and so on. Make
    sure it’s pinned turtles all the way down. It’s a pain if you have to
    retrofit, but worth it IMO.
  2. Every time you upload a cookbook that will be pinned, it is frozen.
    Every time, no exceptions. What you don’t want, after going through
    all the trouble of making sure you know exactly what version is going
    to go out on a specific type of node, only to have that version change.

Still doesn’t mean you can’t propagate an error to many machines, but
with additional gates in place, it makes it more difficult for this to
happen unintentionally.


#7

At the risk of continually pimping out the same talk to this list, we ran
into a number of these issues at EA and discuss them here:
http://lanyrd.com/2013/ldndevops-january/scbzkw/.

Along with disciplined source control: the concept of releases, etc. we
built some tooling to:

  • automatically handle cookbook versioning and version pinning in roles
  • monitor for run_list conformity

A great number of problems in this area would be solved with the ability to
version roles, something we raised 2 years ago I raised again with Chris
Brown during a recent speaking engagement with him. Instead we have the
cookbook wrapper pattern which in practice isn’t especially great. Others
in our circle have approached this from the point of view of having Jenkins
move cookbooks through environments inspecting the output of chef spec and
minitest runs, lints and whatnot before gating the release to live. In the
round however the problem of continually deploying code to production is
a development problem and one that is quite mature in terms of tooling and
practice. Operations people should be looking to their development teams
for guidance. Devops works both ways :wink:

Sam Pointer
Lead Consultant
www.opsunit.com

On 14 January 2014 18:57, Lamont Granquist lamont@opscode.com wrote:

Yeah, the answer is to use environments and to explicitly version pin all
your cookbooks. I’m not sure I’d suggest going down the road of explicitly
version pinning in your cookbook deps, but instead your promotion from
integration to production (or however you have your environments setup)
should be to copy the exact version pins in the integration environment
file and push them into the production one. Ideally you have a full CD
pipeline where you start with "devops"ers working on their laptops doing
local changes to cookbooks and using test-kitchen to validate them against
virtual images. Then those changes are published and made available as a
cookbook version on the chef server, and a jenkins environment that floats
on cookbook versions picks them up and runs test-kitchen against them, and
if they pass they’re promoted to a full-sized integration environment where
again there’s a test suite that validates that the changes work. If you
swallow the whole koolaid then you will eventually do a push to production
with any CRB approval since you’ve proved the change works via multiple
testing steps in your different environments.

You don’t even really need semantic versioning in this system, you just
need to keep cookbook versions unique and frozen. The set that gets
deployed is the set that was tested together and works, you don’t really
want any version constraints in your cookbooks at all, you just test latest
against latest and if its good you ship it.

The same basic workflow also works if you rip out all the CD, but then
you’ve got manual Q/A process and humans pressing buttons to do the
environment promotion, and a CRB involved at some point before it goes to
prod.

On 1/14/14 8:15 AM, Dylan Northrup wrote:

On Tue, Jan 14, 2014 at 7:02 AM, Sam Darwin samuel.d.darwin@gmail.comwrote:

it seems like the ideal case of using chef, would be to run it as a
cronjob
every day on the servers, to keep the servers in their proper and perfect
configuration.

In practice, we are currently not doing this. there is more than just
one
simple reason for it.

  • the recipes would need to always be in a perfect and functional state.
    let’s say that you were experimenting with something. and then every
    single
    server in your environment grabs that recipe and runs it. this could
    affect
    the entire system, every single server.

If you pin recipes (directly in run_lists or indirectly in environments
or roles), you can upload new versions of recipes without fear of having
them propagate to unintended servers. An operational pattern I’ve used in
the past is to have dev servers with versions that aren’t pinned (so they
get the latest version of whatever cookbooks are out there) and have every
other type of server (integration, reference, production, etc) have
explicitly pinned versions. This allows “testing” and “development” of
cookbooks in the same chef organization/server (insuring you’re don’t have
any “Dev is different than ref is different than int is different than
prod” issues) without concern that new cookbook versions will affect
production nodes (or any other ones, for that matter).

Two things you’ll want (or need) to do if you put this into practice:

  1. Verify your pinning all the way down for non-dynamic cookbooks. We
    distribute user information (logins, ssh keys, sudoers, etc) via a
    generated cookbook. The version of this cookbook will change every time
    it’s newly generated. Since we want to use the most recent version of this
    cookbook every time, we do not do any pinning for this cookbook. But for
    cookbooks that are not dynamic, everything gets pinned explicitly or via
    dependent pinning. So, the platform cookbook explicitly pins apt, bash,
    system ruby, etc; nginx cookbook explicitly pins the lua cookbook and any
    module cookbooks it depends on; tomcat cookbook explicitly pins the java
    cookbook version; and so on. Make sure it’s pinned turtles all the way
    down. It’s a pain if you have to retrofit, but worth it IMO.
  2. Every time you upload a cookbook that will be pinned, it is frozen.
    Every time, no exceptions. What you don’t want, after going through all
    the trouble of making sure you know exactly what version is going to go out
    on a specific type of node, only to have that version change.

Still doesn’t mean you can’t propagate an error to many machines, but
with additional gates in place, it makes it more difficult for this to
happen unintentionally.


#8

this was also discussed in last summit. version everything (roles,
environments… etc) was an item in chef 12 wish list… may be @btm have
some update?

On Wed, Jan 15, 2014 at 7:59 AM, Sam Pointer sam.pointer@opsunit.comwrote:

At the risk of continually pimping out the same talk to this list, we ran
into a number of these issues at EA and discuss them here:
http://lanyrd.com/2013/ldndevops-january/scbzkw/.

Along with disciplined source control: the concept of releases, etc. we
built some tooling to:

  • automatically handle cookbook versioning and version pinning in roles
  • monitor for run_list conformity

A great number of problems in this area would be solved with the ability
to version roles, something we raised 2 years ago I raised again with Chris
Brown during a recent speaking engagement with him. Instead we have the
cookbook wrapper pattern which in practice isn’t especially great. Others
in our circle have approached this from the point of view of having Jenkins
move cookbooks through environments inspecting the output of chef spec and
minitest runs, lints and whatnot before gating the release to live. In the
round however the problem of continually deploying code to production is
a development problem and one that is quite mature in terms of tooling and
practice. Operations people should be looking to their development teams
for guidance. Devops works both ways :wink:

Sam Pointer
Lead Consultant
www.opsunit.com

On 14 January 2014 18:57, Lamont Granquist lamont@opscode.com wrote:

Yeah, the answer is to use environments and to explicitly version pin all
your cookbooks. I’m not sure I’d suggest going down the road of explicitly
version pinning in your cookbook deps, but instead your promotion from
integration to production (or however you have your environments setup)
should be to copy the exact version pins in the integration environment
file and push them into the production one. Ideally you have a full CD
pipeline where you start with "devops"ers working on their laptops doing
local changes to cookbooks and using test-kitchen to validate them against
virtual images. Then those changes are published and made available as a
cookbook version on the chef server, and a jenkins environment that floats
on cookbook versions picks them up and runs test-kitchen against them, and
if they pass they’re promoted to a full-sized integration environment where
again there’s a test suite that validates that the changes work. If you
swallow the whole koolaid then you will eventually do a push to production
with any CRB approval since you’ve proved the change works via multiple
testing steps in your different environments.

You don’t even really need semantic versioning in this system, you just
need to keep cookbook versions unique and frozen. The set that gets
deployed is the set that was tested together and works, you don’t really
want any version constraints in your cookbooks at all, you just test latest
against latest and if its good you ship it.

The same basic workflow also works if you rip out all the CD, but then
you’ve got manual Q/A process and humans pressing buttons to do the
environment promotion, and a CRB involved at some point before it goes to
prod.

On 1/14/14 8:15 AM, Dylan Northrup wrote:

On Tue, Jan 14, 2014 at 7:02 AM, Sam Darwin samuel.d.darwin@gmail.comwrote:

it seems like the ideal case of using chef, would be to run it as a
cronjob
every day on the servers, to keep the servers in their proper and perfect
configuration.

In practice, we are currently not doing this. there is more than just
one
simple reason for it.

  • the recipes would need to always be in a perfect and functional state.
    let’s say that you were experimenting with something. and then every
    single
    server in your environment grabs that recipe and runs it. this could
    affect
    the entire system, every single server.

If you pin recipes (directly in run_lists or indirectly in environments
or roles), you can upload new versions of recipes without fear of having
them propagate to unintended servers. An operational pattern I’ve used in
the past is to have dev servers with versions that aren’t pinned (so they
get the latest version of whatever cookbooks are out there) and have every
other type of server (integration, reference, production, etc) have
explicitly pinned versions. This allows “testing” and “development” of
cookbooks in the same chef organization/server (insuring you’re don’t have
any “Dev is different than ref is different than int is different than
prod” issues) without concern that new cookbook versions will affect
production nodes (or any other ones, for that matter).

Two things you’ll want (or need) to do if you put this into practice:

  1. Verify your pinning all the way down for non-dynamic cookbooks. We
    distribute user information (logins, ssh keys, sudoers, etc) via a
    generated cookbook. The version of this cookbook will change every time
    it’s newly generated. Since we want to use the most recent version of this
    cookbook every time, we do not do any pinning for this cookbook. But for
    cookbooks that are not dynamic, everything gets pinned explicitly or via
    dependent pinning. So, the platform cookbook explicitly pins apt, bash,
    system ruby, etc; nginx cookbook explicitly pins the lua cookbook and any
    module cookbooks it depends on; tomcat cookbook explicitly pins the java
    cookbook version; and so on. Make sure it’s pinned turtles all the way
    down. It’s a pain if you have to retrofit, but worth it IMO.
  2. Every time you upload a cookbook that will be pinned, it is frozen.
    Every time, no exceptions. What you don’t want, after going through all
    the trouble of making sure you know exactly what version is going to go out
    on a specific type of node, only to have that version change.

Still doesn’t mean you can’t propagate an error to many machines, but
with additional gates in place, it makes it more difficult for this to
happen unintentionally.


#9

On Monday, January 20, 2014 at 2:24 AM, Ranjib Dey wrote:

this was also discussed in last summit. version everything (roles, environments… etc) was an item in chef 12 wish list… may be @btm have some update?

I think that allowing versioning of roles will require simplification elsewhere in the model. We already see “failure” cases with the server side dependency solver model where the dependency solver unexpectedly chooses an older version of a cookbook than the user expects because it produces what the solver sees as a more optimal solution.

As a more concrete example of this, suppose you have a cookbook FOO, where FOO 1.0.0 has no dependencies, and FOO 2.0.0 depends on BAR (any version). If you delete BAR from the server, then the dependency solver will choose FOO 1.0.0 as the valid solution to the dependency constraints. This is only a simple example, and when you get more real-world complexity, the outcomes can get more confusing.

Now, if you imagine a case like the above, except that the run list is different between different versions of roles, which causes different cookbooks with different dependency constraints to get pulled (oh, and roles can set dependency constraints on recipes, too, so that could factor in), you have a recipe for madness.

That said, roles do solve a handful of problems pretty well: they provide a way to compose run_lists so that you don’t need to edit 10s or 1000s of node objects to add a new recipe to those node’s run_lists; they allow you to customize attributes in a way that works much better than the role cookbook pattern (especially in chef 11). These are good things and we want to have them.

I’ve been privately discussing with some folks an alternative way to address this problem, which is to have nodes get their run lists from a separate document/resource/object called a Policyfile, which would replace both roles and environments. You would have a Berksfile-like file that would include a run_list (which could contain roles if you desire), attributes, and a set of cookbook version constraints. On your workstation, a tool would evaluate this file to produce a document, called a “policy lock file,” containing the expanded run list, the exact versions of cookbooks to use, and a compiled set of attributes (including attributes from roles, if you used them). You’d then push this to the server (and upload any required cookbooks), and nodes assigned to this policy would use that run list, attributes, and cookbook set. Since all the factors affecting what code chef-client runs are statically defined in the policy lock file, you could diff your proposed version with the current version and see the exact changes that would be applied to your systems. In this model, role versioning becomes a lot less important because role updates don’t take effect until you “recompile” your Policyfile and push it to the server. You would also be able to fetch roles from a variety of sources before compiling (such as on-disk or a git repo) so you would have a variety of options for adapting role sources to fit your desired workflow.

We built a demoware prototype of this tool/workflow. Note that it is 90% non-functional and is intended to function like a UI wireframe drawing, so none of the design decisions implied by this prototype are set in stone. https://github.com/danielsdeleo/chef-workflow2-prototype

Any commentary or questions are welcome.


Daniel DeLeo


#10

Thanks for breaking this discussion out. I think i would’ve missed it if it
wasn’t broke out.

I like the idea personally. Without having gone through the demoware, i
have a question.

Since you state that roles and environments would get replaced in the
proposed model, how would attribute precedence be handled? I rely on the
fact that chef_environments have the highest override attribute precendence
(other than force_override and automatic attributes) so that i can override
an attribute when needed and not have to worry about another role or a
recipe setting a default attribute somewhere.

On Tue, Jan 21, 2014 at 5:05 PM, Daniel DeLeo dan@kallistec.com wrote:

On Monday, January 20, 2014 at 2:24 AM, Ranjib Dey wrote:

this was also discussed in last summit. version everything (roles,
environments… etc) was an item in chef 12 wish list… may be @btm have
some update?

I think that allowing versioning of roles will require simplification
elsewhere in the model. We already see “failure” cases with the server side
dependency solver model where the dependency solver unexpectedly chooses an
older version of a cookbook than the user expects because it produces what
the solver sees as a more optimal solution.

As a more concrete example of this, suppose you have a cookbook FOO, where
FOO 1.0.0 has no dependencies, and FOO 2.0.0 depends on BAR (any version).
If you delete BAR from the server, then the dependency solver will choose
FOO 1.0.0 as the valid solution to the dependency constraints. This is only
a simple example, and when you get more real-world complexity, the outcomes
can get more confusing.

Now, if you imagine a case like the above, except that the run list is
different between different versions of roles, which causes different
cookbooks with different dependency constraints to get pulled (oh, and
roles can set dependency constraints on recipes, too, so that could factor
in), you have a recipe for madness.

That said, roles do solve a handful of problems pretty well: they provide
a way to compose run_lists so that you don’t need to edit 10s or 1000s of
node objects to add a new recipe to those node’s run_lists; they allow you
to customize attributes in a way that works much better than the role
cookbook pattern (especially in chef 11). These are good things and we want
to have them.

I’ve been privately discussing with some folks an alternative way to
address this problem, which is to have nodes get their run lists from a
separate document/resource/object called a Policyfile, which would replace
both roles and environments. You would have a Berksfile-like file that
would include a run_list (which could contain roles if you desire),
attributes, and a set of cookbook version constraints. On your workstation,
a tool would evaluate this file to produce a document, called a “policy
lock file,” containing the expanded run list, the exact versions of
cookbooks to use, and a compiled set of attributes (including attributes
from roles, if you used them). You’d then push this to the server (and
upload any required cookbooks), and nodes assigned to this policy would use
that run list, attributes, and cookbook set. Since all the factors
affecting what code chef-client runs are statically defined in the policy
lock file, you could diff your proposed version with the current version
and see the exact changes that would be applied to your systems. In this
model, role versioning becomes a lot less important because role updates
don’t take effect until you “recompile” your Policyfile and push it to the
server. You would also be able to fetch roles from a variety of sources
before compiling (such as on-disk or a git repo) so you would have a
variety of options for adapting role sources to fit your desired workflow.

We built a demoware prototype of this tool/workflow. Note that it is 90%
non-functional and is intended to function like a UI wireframe drawing, so
none of the design decisions implied by this prototype are set in stone.
https://github.com/danielsdeleo/chef-workflow2-prototype

Any commentary or questions are welcome.


Daniel DeLeo


Elvin Abordo
Mobile: (845) 475-8744


#11

On Tuesday, January 21, 2014 at 2:59 PM, Elvin Abordo wrote:

Thanks for breaking this discussion out. I think i would’ve missed it if it wasn’t broke out.

I like the idea personally. Without having gone through the demoware, i have a question.

Since you state that roles and environments would get replaced in the proposed model, how would attribute precedence be handled? I rely on the fact that chef_environments have the highest override attribute precendence (other than force_override and automatic attributes) so that i can override an attribute when needed and not have to worry about another role or a recipe setting a default attribute somewhere.
Caveat again that nothing has actually been built yet (the demo ware could make for a convincing screencast, but most commands actually just cat a file). Probably more interesting are the documents in that repo’s docs/ dir.

The way I imagine attributes working is similar to how roles are expanded now, where the “outermost” role wins over roles nested inside of it. Any attributes you set explicitly in a policy file would be the “outermost-est” and win over anything in a role. You could also use attribute-only roles to set attributes across multiple policies or evaluate arbitrary ruby code during policy file evaluation to set attributes.


Daniel DeLeo


#12

Ranjib Dey dey.ranjib@gmail.com writes:

this was also discussed in last summit. version everything (roles,
environments… etc) was an item in chef 12 wish list… may be @btm have
some update?

Personally I’m not a fan of the push for versioned roles. As one version
of a familiar joke goes: ‘Some people, when confronted with a problem,
think “I know I’ll use versions”. Now they have 2.5.1 problems’

Roles provide some nice functionality in certain cases: sharing common
attributes and run_lists across groups of servers or setting attributes
at the right precedence level.

It seems many of the issues encountered when using roles can be better
addressed by means other than versioned roles. For me it helps to view
them simply as data and you don’t generally version data. You do
carefully manage change to your data or make sure that data is used and
accessible where it is most appropriate and not shared when it isn’t.

The Policyfile idea Dan mentioned is similar to some of the stack
freezing ideas we use for working with AWS and cloudformation.

Using (WIP): https://github.com/heavywater/knife-cloudformation/

‘knife cloudformation export’

will export a current cfn stack in a state such that an exact copy of
the dependency tree can be recreated again.

On Wed, Jan 15, 2014 at 7:59 AM, Sam Pointer sam.pointer@opsunit.comwrote:

At the risk of continually pimping out the same talk to this list, we ran
into a number of these issues at EA and discuss them here:
http://lanyrd.com/2013/ldndevops-january/scbzkw/.

Along with disciplined source control: the concept of releases, etc. we
built some tooling to:

  • automatically handle cookbook versioning and version pinning in roles
  • monitor for run_list conformity

A great number of problems in this area would be solved with the ability
to version roles, something we raised 2 years ago I raised again with Chris
Brown during a recent speaking engagement with him. Instead we have the
cookbook wrapper pattern which in practice isn’t especially great. Others
in our circle have approached this from the point of view of having Jenkins
move cookbooks through environments inspecting the output of chef spec and
minitest runs, lints and whatnot before gating the release to live. In the
round however the problem of continually deploying code to production is
a development problem and one that is quite mature in terms of tooling and
practice. Operations people should be looking to their development teams
for guidance. Devops works both ways :wink:

Sam Pointer
Lead Consultant
www.opsunit.com

On 14 January 2014 18:57, Lamont Granquist lamont@opscode.com wrote:

Yeah, the answer is to use environments and to explicitly version pin all
your cookbooks. I’m not sure I’d suggest going down the road of explicitly
version pinning in your cookbook deps, but instead your promotion from
integration to production (or however you have your environments setup)
should be to copy the exact version pins in the integration environment
file and push them into the production one. Ideally you have a full CD
pipeline where you start with "devops"ers working on their laptops doing
local changes to cookbooks and using test-kitchen to validate them against
virtual images. Then those changes are published and made available as a
cookbook version on the chef server, and a jenkins environment that floats
on cookbook versions picks them up and runs test-kitchen against them, and
if they pass they’re promoted to a full-sized integration environment where
again there’s a test suite that validates that the changes work. If you
swallow the whole koolaid then you will eventually do a push to production
with any CRB approval since you’ve proved the change works via multiple
testing steps in your different environments.

You don’t even really need semantic versioning in this system, you just
need to keep cookbook versions unique and frozen. The set that gets
deployed is the set that was tested together and works, you don’t really
want any version constraints in your cookbooks at all, you just test latest
against latest and if its good you ship it.

The same basic workflow also works if you rip out all the CD, but then
you’ve got manual Q/A process and humans pressing buttons to do the
environment promotion, and a CRB involved at some point before it goes to
prod.

On 1/14/14 8:15 AM, Dylan Northrup wrote:

On Tue, Jan 14, 2014 at 7:02 AM, Sam Darwin samuel.d.darwin@gmail.comwrote:

it seems like the ideal case of using chef, would be to run it as a
cronjob
every day on the servers, to keep the servers in their proper and perfect
configuration.

In practice, we are currently not doing this. there is more than just
one
simple reason for it.

  • the recipes would need to always be in a perfect and functional state.
    let’s say that you were experimenting with something. and then every
    single
    server in your environment grabs that recipe and runs it. this could
    affect
    the entire system, every single server.

If you pin recipes (directly in run_lists or indirectly in environments
or roles), you can upload new versions of recipes without fear of having
them propagate to unintended servers. An operational pattern I’ve used in
the past is to have dev servers with versions that aren’t pinned (so they
get the latest version of whatever cookbooks are out there) and have every
other type of server (integration, reference, production, etc) have
explicitly pinned versions. This allows “testing” and “development” of
cookbooks in the same chef organization/server (insuring you’re don’t have
any “Dev is different than ref is different than int is different than
prod” issues) without concern that new cookbook versions will affect
production nodes (or any other ones, for that matter).

Two things you’ll want (or need) to do if you put this into practice:

  1. Verify your pinning all the way down for non-dynamic cookbooks. We
    distribute user information (logins, ssh keys, sudoers, etc) via a
    generated cookbook. The version of this cookbook will change every time
    it’s newly generated. Since we want to use the most recent version of this
    cookbook every time, we do not do any pinning for this cookbook. But for
    cookbooks that are not dynamic, everything gets pinned explicitly or via
    dependent pinning. So, the platform cookbook explicitly pins apt, bash,
    system ruby, etc; nginx cookbook explicitly pins the lua cookbook and any
    module cookbooks it depends on; tomcat cookbook explicitly pins the java
    cookbook version; and so on. Make sure it’s pinned turtles all the way
    down. It’s a pain if you have to retrofit, but worth it IMO.
  2. Every time you upload a cookbook that will be pinned, it is frozen.
    Every time, no exceptions. What you don’t want, after going through all
    the trouble of making sure you know exactly what version is going to go out
    on a specific type of node, only to have that version change.

Still doesn’t mean you can’t propagate an error to many machines, but
with additional gates in place, it makes it more difficult for this to
happen unintentionally.


-sean


#13

Versioning can mean different things. I know when I’m talking about
versioned roles I’m thinking more of uniqueness and pinning and less of
the problem creating by having to assign an X.Y.Z number to every role
and deal with freezing roles and other nonsense associated with semantic
versioning. I just want a way that roles can be uniquely identified and
fully pinned artifacts, including roles, are shipped around to
environments so that you can’t magically update a version of a role in
prod without promoting a fully-tested batch of artifacts. I definitely
don’t want to make X.Y.Z versioning the default behavior and just make
it opt-in when it makes sense (a base role that is consumed by other
roles could usefully be said to have an “API” and semantically
versioning that might be useful – a top level role not consumed by
anything else doesn’t really have an external API so assigning X.Y.Z
versions to it is just a useless offering to the gods of software
development because you think you have to do that by law or something…).

On 1/21/14 5:37 PM, Sean Escriva wrote:

Ranjib Dey dey.ranjib@gmail.com writes:

this was also discussed in last summit. version everything (roles,
environments… etc) was an item in chef 12 wish list… may be @btm have
some update?

Personally I’m not a fan of the push for versioned roles. As one version
of a familiar joke goes: ‘Some people, when confronted with a problem,
think “I know I’ll use versions”. Now they have 2.5.1 problems’

Roles provide some nice functionality in certain cases: sharing common
attributes and run_lists across groups of servers or setting attributes
at the right precedence level.

It seems many of the issues encountered when using roles can be better
addressed by means other than versioned roles. For me it helps to view
them simply as data and you don’t generally version data. You do
carefully manage change to your data or make sure that data is used and
accessible where it is most appropriate and not shared when it isn’t.

The Policyfile idea Dan mentioned is similar to some of the stack
freezing ideas we use for working with AWS and cloudformation.

Using (WIP): https://github.com/heavywater/knife-cloudformation/

‘knife cloudformation export’

will export a current cfn stack in a state such that an exact copy of
the dependency tree can be recreated again.

On Wed, Jan 15, 2014 at 7:59 AM, Sam Pointer sam.pointer@opsunit.comwrote:

At the risk of continually pimping out the same talk to this list, we ran
into a number of these issues at EA and discuss them here:
http://lanyrd.com/2013/ldndevops-january/scbzkw/.

Along with disciplined source control: the concept of releases, etc. we
built some tooling to:

- automatically handle cookbook versioning and version pinning in roles
- monitor for run_list conformity

A great number of problems in this area would be solved with the ability
to version roles, something we raised 2 years ago I raised again with Chris
Brown during a recent speaking engagement with him. Instead we have the
cookbook wrapper pattern which in practice isn’t especially great. Others
in our circle have approached this from the point of view of having Jenkins
move cookbooks through environments inspecting the output of chef spec and
minitest runs, lints and whatnot before gating the release to live. In the
round however the problem of continually deploying code to production is
a development problem and one that is quite mature in terms of tooling and
practice. Operations people should be looking to their development teams
for guidance. Devops works both ways :wink:

Sam Pointer
Lead Consultant
www.opsunit.com

On 14 January 2014 18:57, Lamont Granquist lamont@opscode.com wrote:

Yeah, the answer is to use environments and to explicitly version pin all
your cookbooks. I’m not sure I’d suggest going down the road of explicitly
version pinning in your cookbook deps, but instead your promotion from
integration to production (or however you have your environments setup)
should be to copy the exact version pins in the integration environment
file and push them into the production one. Ideally you have a full CD
pipeline where you start with "devops"ers working on their laptops doing
local changes to cookbooks and using test-kitchen to validate them against
virtual images. Then those changes are published and made available as a
cookbook version on the chef server, and a jenkins environment that floats
on cookbook versions picks them up and runs test-kitchen against them, and
if they pass they’re promoted to a full-sized integration environment where
again there’s a test suite that validates that the changes work. If you
swallow the whole koolaid then you will eventually do a push to production
with any CRB approval since you’ve proved the change works via multiple
testing steps in your different environments.

You don’t even really need semantic versioning in this system, you just
need to keep cookbook versions unique and frozen. The set that gets
deployed is the set that was tested together and works, you don’t really
want any version constraints in your cookbooks at all, you just test latest
against latest and if its good you ship it.

The same basic workflow also works if you rip out all the CD, but then
you’ve got manual Q/A process and humans pressing buttons to do the
environment promotion, and a CRB involved at some point before it goes to
prod.

On 1/14/14 8:15 AM, Dylan Northrup wrote:

On Tue, Jan 14, 2014 at 7:02 AM, Sam Darwin samuel.d.darwin@gmail.comwrote:

it seems like the ideal case of using chef, would be to run it as a
cronjob
every day on the servers, to keep the servers in their proper and perfect
configuration.

In practice, we are currently not doing this. there is more than just
one
simple reason for it.

  • the recipes would need to always be in a perfect and functional state.
    let’s say that you were experimenting with something. and then every
    single
    server in your environment grabs that recipe and runs it. this could
    affect
    the entire system, every single server.

If you pin recipes (directly in run_lists or indirectly in environments
or roles), you can upload new versions of recipes without fear of having
them propagate to unintended servers. An operational pattern I’ve used in
the past is to have dev servers with versions that aren’t pinned (so they
get the latest version of whatever cookbooks are out there) and have every
other type of server (integration, reference, production, etc) have
explicitly pinned versions. This allows “testing” and “development” of
cookbooks in the same chef organization/server (insuring you’re don’t have
any “Dev is different than ref is different than int is different than
prod” issues) without concern that new cookbook versions will affect
production nodes (or any other ones, for that matter).

Two things you’ll want (or need) to do if you put this into practice:

  1. Verify your pinning all the way down for non-dynamic cookbooks. We
    distribute user information (logins, ssh keys, sudoers, etc) via a
    generated cookbook. The version of this cookbook will change every time
    it’s newly generated. Since we want to use the most recent version of this
    cookbook every time, we do not do any pinning for this cookbook. But for
    cookbooks that are not dynamic, everything gets pinned explicitly or via
    dependent pinning. So, the platform cookbook explicitly pins apt, bash,
    system ruby, etc; nginx cookbook explicitly pins the lua cookbook and any
    module cookbooks it depends on; tomcat cookbook explicitly pins the java
    cookbook version; and so on. Make sure it’s pinned turtles all the way
    down. It’s a pain if you have to retrofit, but worth it IMO.
  2. Every time you upload a cookbook that will be pinned, it is frozen.
    Every time, no exceptions. What you don’t want, after going through all
    the trouble of making sure you know exactly what version is going to go out
    on a specific type of node, only to have that version change.

Still doesn’t mean you can’t propagate an error to many machines, but
with additional gates in place, it makes it more difficult for this to
happen unintentionally.