Automated check-ins or not


#1

I am interested in hearing what others are doing in terms of allowing nodes to automatically check in with chef or not. It has recently come up as a concern with a party in our company, he would prefer to not see nodes check in automatically with chef (I currently have a cron job that runs chef-client every X number of minutes).

I am just interested in hearing how others manage this, I am not certain that I think that manually running chef-client is a good solution.

I am being slightly vague on purpose, because I am looking for full case examples from others using chef and how they are using it.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator
San Mateo | Ann Arbor | New York | London
O 734.922.7014 | C 614.423.9871 | www.MyBuys.comhttp://www.mybuys.com/
[cid:image001.png@01CDED83.57EED120]


#2

by check in do you mean chef runs or chef registrations. I am aware of 3
different ways

  1. on demand: use rundeck, or mco or capistrano like tools to invoke chef
    run. pros: on demand :-), which helps if you deploy your application via
    chef. also you can eliminate the need of a validation certificate. cons:
    requires additional tooling, special security considerations etc.

  2. as service : specify a splay time, and use the standard init scripts to
    run chef client as service. pros: no additional configuration required, no
    dependency on any other tools. cons: memory leak, stale processes used to
    be a pain.

  3. as a scheduled job : use cron or rufus like system to run chef on
    periodic interval. pros: simple, less prone to memory leaks., cons: infra
    has to be designed as evantually consistent, on demand application
    deployment can not be done., additional considerations needed on deciding
    cron times on individual servers, else u’ll storm the chef server.

i have used pretty much all three of these. and i think all of them has
merits. choose any one depending upon what you do, how you are doing it and
how comfortable you are with chef and those tools. most of the issues with
running chef as service are now sorted (or workarounds are known).

best
ranjib

On Mon, Jan 13, 2014 at 12:52 PM, Phillip Roberts proberts@mybuys.comwrote:

I am interested in hearing what others are doing in terms of allowing
nodes to automatically check in with chef or not. It has recently come up
as a concern with a party in our company, he would prefer to not see nodes
check in automatically with chef (I currently have a cron job that runs
chef-client every X number of minutes).

I am just interested in hearing how others manage this, I am not certain
that I think that manually running chef-client is a good solution.

I am being slightly vague on purpose, because I am looking for full case
examples from others using chef and how they are using it.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 | C 614.423.9871 <614.423.9871> *| *www.MyBuys.comhttp://www.mybuys.com/

[image: cid:image001.png@01CDED83.57EED120]


#3

Chef as a tool is used for orchestration, converging nodes to a desired
state. If your coworker doesn’t want nodes checking in automatically, then
perhaps Chef isn’t the ideal tool for you. What does your use case look
like?

On Mon, Jan 13, 2014 at 1:05 PM, Ranjib Dey dey.ranjib@gmail.com wrote:

by check in do you mean chef runs or chef registrations. I am aware of 3
different ways

  1. on demand: use rundeck, or mco or capistrano like tools to invoke chef
    run. pros: on demand :-), which helps if you deploy your application via
    chef. also you can eliminate the need of a validation certificate. cons:
    requires additional tooling, special security considerations etc.

  2. as service : specify a splay time, and use the standard init scripts to
    run chef client as service. pros: no additional configuration required, no
    dependency on any other tools. cons: memory leak, stale processes used to
    be a pain.

  3. as a scheduled job : use cron or rufus like system to run chef on
    periodic interval. pros: simple, less prone to memory leaks., cons: infra
    has to be designed as evantually consistent, on demand application
    deployment can not be done., additional considerations needed on deciding
    cron times on individual servers, else u’ll storm the chef server.

i have used pretty much all three of these. and i think all of them has
merits. choose any one depending upon what you do, how you are doing it and
how comfortable you are with chef and those tools. most of the issues with
running chef as service are now sorted (or workarounds are known).

best
ranjib

On Mon, Jan 13, 2014 at 12:52 PM, Phillip Roberts proberts@mybuys.comwrote:

I am interested in hearing what others are doing in terms of allowing
nodes to automatically check in with chef or not. It has recently come up
as a concern with a party in our company, he would prefer to not see nodes
check in automatically with chef (I currently have a cron job that runs
chef-client every X number of minutes).

I am just interested in hearing how others manage this, I am not certain
that I think that manually running chef-client is a good solution.

I am being slightly vague on purpose, because I am looking for full case
examples from others using chef and how they are using it.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 | C 614.423.9871 <614.423.9871> *| *www.MyBuys.comhttp://www.mybuys.com/

[image: cid:image001.png@01CDED83.57EED120]


#4

The problem isn’t my coworker, the problem is a lack of understanding the tool.

Chef is my baby, and I am perfectly fine with automated check-in’s, however, just like any business, there are politics at play. There are fears due to a lack of understanding as well.

I am purposely asking for others use cases because I am interested in them to help me form my arguments as to why chef nodes should be checking in (running chef-client) automatically.

I am not asking for anyone to tell me whether we should be using chef, or how we should be using chef, I am interested in how it is being used in other environments. I have seen plenty of other environments where I have implemented chef, however, in all cases, I have implemented chef and the policies that surround chef. In all cases, this question has never come up, or this argument.

I appreciate the responses thus far.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator
San Mateo | Ann Arbor | New York | London
O 734.922.7014 | C 614.423.9871 | www.MyBuys.comhttp://www.mybuys.com/
[cid:image001.png@01CDED83.57EED120]

From: Christopher Armstrong [mailto:chris@chrisarmstrong.me]
Sent: Monday, January 13, 2014 4:09 PM
To: chef@lists.opscode.com
Subject: [chef] Re: Re: Automated check-ins or not…

Chef as a tool is used for orchestration, converging nodes to a desired state. If your coworker doesn’t want nodes checking in automatically, then perhaps Chef isn’t the ideal tool for you. What does your use case look like?

On Mon, Jan 13, 2014 at 1:05 PM, Ranjib Dey <dey.ranjib@gmail.commailto:dey.ranjib@gmail.com> wrote:
by check in do you mean chef runs or chef registrations. I am aware of 3 different ways

  1. on demand: use rundeck, or mco or capistrano like tools to invoke chef run. pros: on demand :-), which helps if you deploy your application via chef. also you can eliminate the need of a validation certificate. cons: requires additional tooling, special security considerations etc.

  2. as service : specify a splay time, and use the standard init scripts to run chef client as service. pros: no additional configuration required, no dependency on any other tools. cons: memory leak, stale processes used to be a pain.

  3. as a scheduled job : use cron or rufus like system to run chef on periodic interval. pros: simple, less prone to memory leaks., cons: infra has to be designed as evantually consistent, on demand application deployment can not be done., additional considerations needed on deciding cron times on individual servers, else u’ll storm the chef server.

i have used pretty much all three of these. and i think all of them has merits. choose any one depending upon what you do, how you are doing it and how comfortable you are with chef and those tools. most of the issues with running chef as service are now sorted (or workarounds are known).

best
ranjib

On Mon, Jan 13, 2014 at 12:52 PM, Phillip Roberts <proberts@mybuys.commailto:proberts@mybuys.com> wrote:
I am interested in hearing what others are doing in terms of allowing nodes to automatically check in with chef or not. It has recently come up as a concern with a party in our company, he would prefer to not see nodes check in automatically with chef (I currently have a cron job that runs chef-client every X number of minutes).

I am just interested in hearing how others manage this, I am not certain that I think that manually running chef-client is a good solution.

I am being slightly vague on purpose, because I am looking for full case examples from others using chef and how they are using it.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator
San Mateo | Ann Arbor | New York | London
O 734.922.7014tel:734.922.7014 | C 614.423.9871tel:614.423.9871 | www.MyBuys.comhttp://www.mybuys.com/
[cid:image001.png@01CDED83.57EED120]


#5

You can also run chef-client as a process with the -i (I believe) which sets a time interval for the process to initiate connectivity to the chef server for run list information.

Sent from my iPhone

On Jan 13, 2014, at 3:05 PM, Ranjib Dey dey.ranjib@gmail.com wrote:

by check in do you mean chef runs or chef registrations. I am aware of 3 different ways

  1. on demand: use rundeck, or mco or capistrano like tools to invoke chef run. pros: on demand :-), which helps if you deploy your application via chef. also you can eliminate the need of a validation certificate. cons: requires additional tooling, special security considerations etc.

  2. as service : specify a splay time, and use the standard init scripts to run chef client as service. pros: no additional configuration required, no dependency on any other tools. cons: memory leak, stale processes used to be a pain.

  3. as a scheduled job : use cron or rufus like system to run chef on periodic interval. pros: simple, less prone to memory leaks., cons: infra has to be designed as evantually consistent, on demand application deployment can not be done., additional considerations needed on deciding cron times on individual servers, else u’ll storm the chef server.

i have used pretty much all three of these. and i think all of them has merits. choose any one depending upon what you do, how you are doing it and how comfortable you are with chef and those tools. most of the issues with running chef as service are now sorted (or workarounds are known).

best
ranjib

On Mon, Jan 13, 2014 at 12:52 PM, Phillip Roberts proberts@mybuys.com wrote:

I am interested in hearing what others are doing in terms of allowing nodes to automatically check in with chef or not. It has recently come up as a concern with a party in our company, he would prefer to not see nodes check in automatically with chef (I currently have a cron job that runs chef-client every X number of minutes).

I am just interested in hearing how others manage this, I am not certain that I think that manually running chef-client is a good solution.

I am being slightly vague on purpose, because I am looking for full case examples from others using chef and how they are using it.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 | C 614.423.9871 | www.MyBuys.com

<image001.png>


#6

We currently run Chef with automatic check-ins with a 20 second interval.
We’ve thought about not running it in daemon mode any longer though. The
reason for switching would be for better application deployments.

I don’t think it’s an anti-pattern to run it either way. It’s all about how
it works for you.

Nic

On Mon, Jan 13, 2014 at 3:16 PM, Anna Redding amrsun23@yahoo.com wrote:

You can also run chef-client as a process with the -i (I believe) which
sets a time interval for the process to initiate connectivity to the chef
server for run list information.

Sent from my iPhone

On Jan 13, 2014, at 3:05 PM, Ranjib Dey dey.ranjib@gmail.com wrote:

by check in do you mean chef runs or chef registrations. I am aware of 3
different ways

  1. on demand: use rundeck, or mco or capistrano like tools to invoke chef
    run. pros: on demand :-), which helps if you deploy your application via
    chef. also you can eliminate the need of a validation certificate. cons:
    requires additional tooling, special security considerations etc.

  2. as service : specify a splay time, and use the standard init scripts to
    run chef client as service. pros: no additional configuration required, no
    dependency on any other tools. cons: memory leak, stale processes used to
    be a pain.

  3. as a scheduled job : use cron or rufus like system to run chef on
    periodic interval. pros: simple, less prone to memory leaks., cons: infra
    has to be designed as evantually consistent, on demand application
    deployment can not be done., additional considerations needed on deciding
    cron times on individual servers, else u’ll storm the chef server.

i have used pretty much all three of these. and i think all of them has
merits. choose any one depending upon what you do, how you are doing it and
how comfortable you are with chef and those tools. most of the issues with
running chef as service are now sorted (or workarounds are known).

best
ranjib

On Mon, Jan 13, 2014 at 12:52 PM, Phillip Roberts proberts@mybuys.comwrote:

I am interested in hearing what others are doing in terms of allowing
nodes to automatically check in with chef or not. It has recently come up
as a concern with a party in our company, he would prefer to not see nodes
check in automatically with chef (I currently have a cron job that runs
chef-client every X number of minutes).

I am just interested in hearing how others manage this, I am not certain
that I think that manually running chef-client is a good solution.

I am being slightly vague on purpose, because I am looking for full case
examples from others using chef and how they are using it.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 | C 614.423.9871 <614.423.9871> *| *www.MyBuys.comhttp://www.mybuys.com/

<image001.png>


#7

We had quite a few discussions about this as well and at the end of the day
we opted for the ability to do both on-demand as well as scheduled. There
were concerns that without a scheduled check-in the amount of drift in
systems could become large over time on servers that don’t routinely get
deployments done. With that drift comes a slew of unknown issues. By
enforcing a schedule run we could be sure that hand modified configurations
didn’t stick around very long.

We’ve setup a report to notify us if a node has not checked-in in the last
day. This helps us catch cases where the schedule run might be failing and
other notification mechanisms might not be catching it (it some nasty
compile error super early in the run)

From there we extended an existing in house tool that lets anyone with
access request a chef run without needing access to the servers.

On Mon, Jan 13, 2014 at 4:16 PM, Phillip Roberts proberts@mybuys.comwrote:

The problem isn’t my coworker, the problem is a lack of understanding
the tool.

Chef is my baby, and I am perfectly fine with automated check-in’s,
however, just like any business, there are politics at play. There are
fears due to a lack of understanding as well.

I am purposely asking for others use cases because I am interested in them
to help me form my arguments as to why chef nodes should be checking in
(running chef-client) automatically.

I am not asking for anyone to tell me whether we should be using chef, or
how we should be using chef, I am interested in how it is being used in
other environments. I have seen plenty of other environments where I have
implemented chef, however, in all cases, I have implemented chef and the
policies that surround chef. In all cases, this question has never come up,
or this argument.

I appreciate the responses thus far.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 | C 614.423.9871 <614.423.9871> *| *www.MyBuys.comhttp://www.mybuys.com/

[image: cid:image001.png@01CDED83.57EED120]

From: Christopher Armstrong [mailto:chris@chrisarmstrong.me]
Sent: Monday, January 13, 2014 4:09 PM
To: chef@lists.opscode.com
Subject: [chef] Re: Re: Automated check-ins or not…

Chef as a tool is used for orchestration, converging nodes to a desired
state. If your coworker doesn’t want nodes checking in automatically, then
perhaps Chef isn’t the ideal tool for you. What does your use case look
like?

On Mon, Jan 13, 2014 at 1:05 PM, Ranjib Dey dey.ranjib@gmail.com wrote:

by check in do you mean chef runs or chef registrations. I am aware of 3
different ways

  1. on demand: use rundeck, or mco or capistrano like tools to invoke chef
    run. pros: on demand :-), which helps if you deploy your application via
    chef. also you can eliminate the need of a validation certificate. cons:
    requires additional tooling, special security considerations etc.

  2. as service : specify a splay time, and use the standard init scripts to
    run chef client as service. pros: no additional configuration required, no
    dependency on any other tools. cons: memory leak, stale processes used to
    be a pain.

  3. as a scheduled job : use cron or rufus like system to run chef on
    periodic interval. pros: simple, less prone to memory leaks., cons: infra
    has to be designed as evantually consistent, on demand application
    deployment can not be done., additional considerations needed on deciding
    cron times on individual servers, else u’ll storm the chef server.

i have used pretty much all three of these. and i think all of them has
merits. choose any one depending upon what you do, how you are doing it and
how comfortable you are with chef and those tools. most of the issues with
running chef as service are now sorted (or workarounds are known).

best

ranjib

On Mon, Jan 13, 2014 at 12:52 PM, Phillip Roberts proberts@mybuys.com
wrote:

I am interested in hearing what others are doing in terms of allowing
nodes to automatically check in with chef or not. It has recently come up
as a concern with a party in our company, he would prefer to not see nodes
check in automatically with chef (I currently have a cron job that runs
chef-client every X number of minutes).

I am just interested in hearing how others manage this, I am not certain
that I think that manually running chef-client is a good solution.

I am being slightly vague on purpose, because I am looking for full case
examples from others using chef and how they are using it.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 | C 614.423.9871 <614.423.9871> *| *www.MyBuys.comhttp://www.mybuys.com/

[image: cid:image001.png@01CDED83.57EED120]


#8

On Monday, January 13, 2014 at 1:16 PM, Phillip Roberts wrote:

The problem isn’t my coworker, the problem is a lack of understanding the tool.

Chef is my baby, and I am perfectly fine with automated check-in’s, however, just like any business, there are politics at play. There are fears due to a lack of understanding as well.

I am purposely asking for others use cases because I am interested in them to help me form my arguments as to why chef nodes should be checking in (running chef-client) automatically.

I am not asking for anyone to tell me whether we should be using chef, or how we should be using chef, I am interested in how it is being used in other environments. I have seen plenty of other environments where I have implemented chef, however, in all cases, I have implemented chef and the policies that surround chef. In all cases, this question has never come up, or this argument.

I appreciate the responses thus far.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator

There’s a joke that goes around twitter every so often that goes like: “to err is human, to propagate your error to 1000 machines automatically is devops.” I think this joke actually does a good job at getting to the heart of your coworkers’ concerns: what prevents a potentially destructive mistake from getting applied to your whole infrastructure?

As you’ve implied one option is to have chef-client be run manually on each machine to apply updates as desired. The pros of this approach:

  • you can use why-run mode to get some indication of what’s going to change when you run chef-client for real
  • Workflow is very simple, you don’t need to invest in a lot of testing or extra infrastructure, just upload cookbooks to the server and run them
  • If you’re using chef for app deployments, you don’t need to have any additional logic or tooling for orchestration, just run chef on the boxes in the right order.

The downsides:

  • You have to manually check whether chef has run recently on all your machines. If you miss one, you could be missing an important security/bug/performance patch. This can get you into problems such as missing a patch on the passive node in a failover pair. When the cluster fails over, the service doesn’t work correctly on the now-active node. I’m sure you can imagine plenty of similar cases.
  • Related to the above, you can get a different delta from starting state to desired state than you expected if a machine is a few cookbook iterations behind. This can cause chef-client to fail or to apply a change incorrectly based on the assumptions in your cookbook code.
  • Your team has to be fairly disciplined about communicating when changes are made to the chef-server. Say Alice uploads her change, runs it on a trial node and it works correctly. Now she starts a parallel SSH session to run chef-client on the remaining nodes. In the meantime, Bob uploads a change to some “base” cookbook and it’s incompatible with Alice’s change. Alice’s chef-client runs fail or cause an outage on those systems.
  • Humans spend a lot of time running chef-client.

My view is that running chef-client in some periodic fashion is a good forcing function that will require you to implement good workflow practices, whether that be cookbook testing with automated uploads to the chef-server from Ci, partitioning your infrastructure so that you have sub-clusters running the “future” cookbook version before the majority of similar machines, testing cookbooks locally in vagrant/test kitchen/whatever, etc. If you have this stuff in place, running chef-client manually (or via orchestration tool) vs. on interval won’t make a big difference.


Daniel DeLeo


#9

On Jan 13, 2014, at 3:49 PM, Daniel DeLeo dan@kallistec.com wrote:

If you have this stuff in place, running chef-client manually (or via orchestration tool) vs. on interval won’t make a big difference.

I’m with Dan. This question really gets down to people, policies, procedures, and preparation. Tools are just there to help you with these things.

Personally, I think you’re making a mistake if you’re not running chef-client as either a daemon (with splay) or as a regular cron job, and on at least a daily (if not hourly) basis.


Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu


#10

Yeah, places that I’ve been where managers have been afraid of config
management (CFengine at the time) running on a schedule has resulted in
an accretion of changes over time, and then once enough changes got
queued up that we had to run it on a server and the change window was
scheduled and it was approved by our CRB board and appropriate offerings
were burned to the gods of ITIL, the changes would often wind up causing
outages because so many changes hit the server and it was hard to
determine the impact ahead of time. But the outages were all contained
to change windows and were approved, so I guess that makes it okay.

A tactic that I’ve used in the past has been to run CM only once per day
and run it with a 12-hour random splay and time it for 8pm-8am. Changes
can be committed during the business day and they don’t immediately take
effect, then they can get tested or pushed out manually. And if
anything goes wrong, it’ll start hitting servers at 8pm and you have a
longer window before it hits your entire infrastructure and more time
for you to get monitoring alerts and stop the changes rolling out. If
you just run Chef every 30 minutes with a 5 minute random splay, then
its likely that by the time your monitoring alerts you and you start
taking action that the change has hit your entire infrastructure. By
only doing the “scheduled” runs once per day you still keep the deltas
between runs small, you allow yourself some time to stop your CM tool
before it all rolls out, and you also reduce the load on your chef
server infrastructure (or on our HEC infrastructure).

The other thing is that if you only run Chef once a week or once a month
on-demand, then you’re not getting the “self-repairing” and SOX/PCI-DSS
"prevent control" features of configuration management. If you’re
running it nightly then any junior SA or malicious attacker that logs
into the server and manually changes the state of critical files will
have those changes immediately rolled back. That produces prevent
controls that auditors really like. That also trains your junior SAs to
not make with the typey-typey on the keyboard and to use the CM program
– otherwise they tend to fall back to old behaviors of making changes
on the console and then its not their fault they did that, its going to
be Chef’s fault that it rolled those changes back when its eventually
run and reverts those changes and the service crashes.

On 1/13/14 1:32 PM, David Petzel wrote:

We had quite a few discussions about this as well and at the end of
the day we opted for the ability to do both on-demand as well as
scheduled. There were concerns that without a scheduled check-in the
amount of drift in systems could become large over time on servers
that don’t routinely get deployments done. With that drift comes a
slew of unknown issues. By enforcing a schedule run we could be sure
that hand modified configurations didn’t stick around very long.

We’ve setup a report to notify us if a node has not checked-in in the
last day. This helps us catch cases where the schedule run might be
failing and other notification mechanisms might not be catching it (it
some nasty compile error super early in the run)

From there we extended an existing in house tool that lets anyone with
access request a chef run without needing access to the servers.

On Mon, Jan 13, 2014 at 4:16 PM, Phillip Roberts <proberts@mybuys.com
mailto:proberts@mybuys.com> wrote:

The problem isn’t my coworker, the problem is a lack of
understanding the tool.

Chef is my baby, and I am perfectly fine with automated
check-in’s, however, just like any business, there are politics at
play. There are fears due to a lack of understanding as well.

I am purposely asking for others use cases because I am interested
in them to help me form my arguments as to why chef nodes should
be checking in (running chef-client) automatically.

I am not asking for anyone to tell me whether we should be using
chef, or how we should be using chef, I am interested in how it is
being used in other environments. I have seen plenty of other
environments where I have implemented chef, however, in all cases,
I have implemented chef and the policies that surround chef. In
all cases, this question has never come up, or this argument.

I appreciate the responses thus far.

Thanks,

*Phillip Roberts | Sr. Linux Systems Administrator*

San Mateo *|* *Ann Arbor |* New York *|* London

*O*734.922.7014 <tel:734.922.7014> *| C 614.423.9871
<tel:614.423.9871>* *| *www.MyBuys.com <http://www.mybuys.com/>

*cid:image001.png@01CDED83.57EED120*

*From:*Christopher Armstrong [mailto:chris@chrisarmstrong.me
<mailto:chris@chrisarmstrong.me>]
*Sent:* Monday, January 13, 2014 4:09 PM
*To:* chef@lists.opscode.com <mailto:chef@lists.opscode.com>
*Subject:* [chef] Re: Re: Automated check-ins or not...

Chef as a tool is used for orchestration, converging nodes to a
desired state. If your coworker doesn't want nodes checking in
automatically, then perhaps Chef isn't the ideal tool for you.
What does your use case look like?

On Mon, Jan 13, 2014 at 1:05 PM, Ranjib Dey <dey.ranjib@gmail.com
<mailto:dey.ranjib@gmail.com>> wrote:

    by check in do you mean chef runs or chef registrations. I am
    aware of 3 different ways

    1) on demand: use rundeck, or mco or capistrano like tools to
    invoke chef run. pros: on demand :-), which helps if you
    deploy your application via chef. also you can eliminate the
    need of a validation certificate. cons: requires additional
    tooling, special security considerations etc.

    2) as service : specify a splay time, and use the standard
    init scripts to run chef client as service. pros:  no
    additional configuration required, no dependency on any other
    tools. cons: memory leak, stale processes used to be a pain.

    3) as a scheduled job : use cron or rufus like system to run
    chef on periodic interval. pros: simple, less prone to memory
    leaks., cons: infra has to be designed as evantually
    consistent, on demand application deployment can not be done.,
    additional considerations needed on deciding cron times on
    individual servers, else u'll storm the chef server.

    i have used pretty much all three of these. and i think all of
    them has merits. choose any one depending upon what you do,
    how you are doing it and how comfortable you are with chef and
    those tools. most of the issues with running chef as service
    are now sorted (or workarounds are known).

    best

    ranjib

    On Mon, Jan 13, 2014 at 12:52 PM, Phillip Roberts
    <proberts@mybuys.com <mailto:proberts@mybuys.com>> wrote:

        I am interested in hearing what others are doing in terms
        of allowing nodes to automatically check in with chef or
        not. It has recently come up as a concern with a party in
        our company, he would prefer to not see nodes check in
        automatically with chef (I currently have a cron job that
        runs chef-client every X number of minutes).

        I am just interested in hearing how others manage this, I
        am not certain that I think that manually running
        chef-client is a good solution.

        I am being slightly vague on purpose, because I am looking
        for full case examples from others using chef and how they
        are using it.

        Thanks,

        *Phillip Roberts | Sr. Linux Systems Administrator*

        San Mateo *|* *Ann Arbor |* New York *|* London

        *O*734.922.7014 <tel:734.922.7014> *| C 614.423.9871
        <tel:614.423.9871>* *| *www.MyBuys.com
        <http://www.mybuys.com/>

        *cid:image001.png@01CDED83.57EED120*

#11

My cookbooks hook into our orchestration server via REST calls to pull down
information about which sites should be configured, etc. During POC build
out I had Chef run every minute, but most of my machines are Windows
servers and Chef is very CPU hungry there. We have modified our
orchestration server to set the updated time for the “pool” when any
resource contained in the “pool” is modified. I wrapped Chef in a .Net
app/service that will first check if the pool has been changed since the
last successful Chef run. This is how we chose to mitigate Chef’s CPU
hunger and allow for faster converge times.

-Greg

On Tue, Jan 14, 2014 at 1:56 PM, Lamont Granquist lamont@opscode.comwrote:

Yeah, places that I’ve been where managers have been afraid of config
management (CFengine at the time) running on a schedule has resulted in an
accretion of changes over time, and then once enough changes got queued up
that we had to run it on a server and the change window was scheduled and
it was approved by our CRB board and appropriate offerings were burned to
the gods of ITIL, the changes would often wind up causing outages because
so many changes hit the server and it was hard to determine the impact
ahead of time. But the outages were all contained to change windows and
were approved, so I guess that makes it okay.

A tactic that I’ve used in the past has been to run CM only once per day
and run it with a 12-hour random splay and time it for 8pm-8am. Changes
can be committed during the business day and they don’t immediately take
effect, then they can get tested or pushed out manually. And if anything
goes wrong, it’ll start hitting servers at 8pm and you have a longer window
before it hits your entire infrastructure and more time for you to get
monitoring alerts and stop the changes rolling out. If you just run Chef
every 30 minutes with a 5 minute random splay, then its likely that by the
time your monitoring alerts you and you start taking action that the change
has hit your entire infrastructure. By only doing the “scheduled” runs
once per day you still keep the deltas between runs small, you allow
yourself some time to stop your CM tool before it all rolls out, and you
also reduce the load on your chef server infrastructure (or on our HEC
infrastructure).

The other thing is that if you only run Chef once a week or once a month
on-demand, then you’re not getting the “self-repairing” and SOX/PCI-DSS
"prevent control" features of configuration management. If you’re running
it nightly then any junior SA or malicious attacker that logs into the
server and manually changes the state of critical files will have those
changes immediately rolled back. That produces prevent controls that
auditors really like. That also trains your junior SAs to not make with
the typey-typey on the keyboard and to use the CM program – otherwise they
tend to fall back to old behaviors of making changes on the console and
then its not their fault they did that, its going to be Chef’s fault that
it rolled those changes back when its eventually run and reverts those
changes and the service crashes.

On 1/13/14 1:32 PM, David Petzel wrote:

We had quite a few discussions about this as well and at the end of the
day we opted for the ability to do both on-demand as well as scheduled.
There were concerns that without a scheduled check-in the amount of drift
in systems could become large over time on servers that don’t routinely get
deployments done. With that drift comes a slew of unknown issues. By
enforcing a schedule run we could be sure that hand modified configurations
didn’t stick around very long.

We’ve setup a report to notify us if a node has not checked-in in the
last day. This helps us catch cases where the schedule run might be failing
and other notification mechanisms might not be catching it (it some nasty
compile error super early in the run)

From there we extended an existing in house tool that lets anyone with
access request a chef run without needing access to the servers.

On Mon, Jan 13, 2014 at 4:16 PM, Phillip Roberts proberts@mybuys.comwrote:

The problem isn’t my coworker, the problem is a lack of understanding
the tool.

Chef is my baby, and I am perfectly fine with automated check-in’s,
however, just like any business, there are politics at play. There are
fears due to a lack of understanding as well.

I am purposely asking for others use cases because I am interested in
them to help me form my arguments as to why chef nodes should be checking
in (running chef-client) automatically.

I am not asking for anyone to tell me whether we should be using chef, or
how we should be using chef, I am interested in how it is being used in
other environments. I have seen plenty of other environments where I have
implemented chef, however, in all cases, I have implemented chef and the
policies that surround chef. In all cases, this question has never come up,
or this argument.

I appreciate the responses thus far.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 | C 614.423.9871 <614.423.9871> *| *www.MyBuys.comhttp://www.mybuys.com/

[image: cid:image001.png@01CDED83.57EED120]

From: Christopher Armstrong [mailto:chris@chrisarmstrong.me]
Sent: Monday, January 13, 2014 4:09 PM
To: chef@lists.opscode.com
Subject: [chef] Re: Re: Automated check-ins or not…

Chef as a tool is used for orchestration, converging nodes to a desired
state. If your coworker doesn’t want nodes checking in automatically, then
perhaps Chef isn’t the ideal tool for you. What does your use case look
like?

On Mon, Jan 13, 2014 at 1:05 PM, Ranjib Dey dey.ranjib@gmail.com wrote:

by check in do you mean chef runs or chef registrations. I am aware of
3 different ways

  1. on demand: use rundeck, or mco or capistrano like tools to invoke chef
    run. pros: on demand :-), which helps if you deploy your application via
    chef. also you can eliminate the need of a validation certificate. cons:
    requires additional tooling, special security considerations etc.

  2. as service : specify a splay time, and use the standard init scripts
    to run chef client as service. pros: no additional configuration required,
    no dependency on any other tools. cons: memory leak, stale processes used
    to be a pain.

  3. as a scheduled job : use cron or rufus like system to run chef on
    periodic interval. pros: simple, less prone to memory leaks., cons: infra
    has to be designed as evantually consistent, on demand application
    deployment can not be done., additional considerations needed on deciding
    cron times on individual servers, else u’ll storm the chef server.

i have used pretty much all three of these. and i think all of them has
merits. choose any one depending upon what you do, how you are doing it and
how comfortable you are with chef and those tools. most of the issues with
running chef as service are now sorted (or workarounds are known).

best

ranjib

On Mon, Jan 13, 2014 at 12:52 PM, Phillip Roberts proberts@mybuys.com
wrote:

I am interested in hearing what others are doing in terms of allowing
nodes to automatically check in with chef or not. It has recently come up
as a concern with a party in our company, he would prefer to not see nodes
check in automatically with chef (I currently have a cron job that runs
chef-client every X number of minutes).

I am just interested in hearing how others manage this, I am not certain
that I think that manually running chef-client is a good solution.

I am being slightly vague on purpose, because I am looking for full case
examples from others using chef and how they are using it.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 | C 614.423.9871 <614.423.9871> *| *www.MyBuys.comhttp://www.mybuys.com/

[image: cid:image001.png@01CDED83.57EED120]


#12

Yeah, but he’s talking about a more fundamental problem with his
management/co-workers not being okay with the fundamental idea of an
automated job running which might change system config.

You’re off on a completely different planet where you’ve accepted the
basic premise of “DevOps” (for lack of a better term) and its a question
not of “should we do it?” but “how aggressive?” and thats influenced by
how well along the road to continuous integration / continuous
deployment you are, which would be like trying to explain quantum
mechanics to a cave man.

On 1/13/14 5:28 PM, Greg Zapp wrote:

My cookbooks hook into our orchestration server via REST calls to pull
down information about which sites should be configured, etc. During
POC build out I had Chef run every minute, but most of my machines are
Windows servers and Chef is very CPU hungry there. We have modified
our orchestration server to set the updated time for the “pool” when
any resource contained in the “pool” is modified. I wrapped Chef in a
.Net app/service that will first check if the pool has been changed
since the last successful Chef run. This is how we chose to mitigate
Chef’s CPU hunger and allow for faster converge times.

-Greg

On Tue, Jan 14, 2014 at 1:56 PM, Lamont Granquist <lamont@opscode.com
mailto:lamont@opscode.com> wrote:

Yeah, places that I've been where managers have been afraid of
config management (CFengine at the time) running on a schedule has
resulted in an accretion of changes over time, and then once
enough changes got queued up that we had to run it on a server and
the change window was scheduled and it was approved by our CRB
board and appropriate offerings were burned to the gods of ITIL,
the changes would often wind up causing outages because so many
changes hit the server and it was hard to determine the impact
ahead of time.  But the outages were all contained to change
windows and were approved, so I guess that makes it okay.

A tactic that I've used in the past has been to run CM only once
per day and run it with a 12-hour random splay and time it for
8pm-8am.  Changes can be committed during the business day and
they don't immediately take effect, then they can get tested or
pushed out manually.  And if anything goes wrong, it'll start
hitting servers at 8pm and you have a longer window before it hits
your entire infrastructure and more time for you to get monitoring
alerts and stop the changes rolling out.  If you just run Chef
every 30 minutes with a 5 minute random splay, then its likely
that by the time your monitoring alerts you and you start taking
action that the change has hit your entire infrastructure.  By
only doing the "scheduled" runs once per day you still keep the
deltas between runs small, you allow yourself some time to stop
your CM tool before it all rolls out, and you also reduce the load
on your chef server infrastructure (or on our HEC infrastructure).

The other thing is that if you only run Chef once a week or once a
month on-demand, then you're not getting the "self-repairing" and
SOX/PCI-DSS "prevent control" features of configuration
management.  If you're running it nightly then any junior SA or
malicious attacker that logs into the server and manually changes
the state of critical files will have those changes immediately
rolled back.  That produces prevent controls that auditors really
like.  That also trains your junior SAs to not make with the
typey-typey on the keyboard and to use the CM program -- otherwise
they tend to fall back to old behaviors of making changes on the
console and then its not their fault they did that, its going to
be Chef's fault that it rolled those changes back when its
eventually run and reverts those changes and the service crashes.


On 1/13/14 1:32 PM, David Petzel wrote:
We had quite a few discussions about this as well and at the end
of the day we opted for the ability to do both on-demand as well
as scheduled. There were concerns that without a scheduled
check-in the amount of drift in systems could become large over
time on servers that don't routinely get deployments done. With
that drift comes a slew of unknown issues. By enforcing a
schedule run we could be sure that hand modified configurations
didn't stick around very long.

We've setup a report to notify us if a node has not checked-in in
the last day. This helps us catch cases where the schedule run
might be failing and other notification mechanisms might not be
catching it (it some nasty compile error super early in the run)

From there we extended an existing in house tool that lets anyone
with access request a chef run without needing access to the servers.




On Mon, Jan 13, 2014 at 4:16 PM, Phillip Roberts
<proberts@mybuys.com <mailto:proberts@mybuys.com>> wrote:

    The problem isn’t my coworker, the problem is a lack of
    understanding the tool.

    Chef is my baby, and I am perfectly fine with automated
    check-in’s, however, just like any business, there are
    politics at play. There are fears due to a lack of
    understanding as well.

    I am purposely asking for others use cases because I am
    interested in them to help me form my arguments as to why
    chef nodes should be checking in (running chef-client)
    automatically.

    I am not asking for anyone to tell me whether we should be
    using chef, or how we should be using chef, I am interested
    in how it is being used in other environments. I have seen
    plenty of other environments where I have implemented chef,
    however, in all cases, I have implemented chef and the
    policies that surround chef. In all cases, this question has
    never come up, or this argument.

    I appreciate the responses thus far.

    Thanks,

    *Phillip Roberts | Sr. Linux Systems Administrator*

    San Mateo *|* *Ann Arbor |* New York *|* London

    *O*734.922.7014 <tel:734.922.7014> *| C 614.423.9871
    <tel:614.423.9871>* *| *www.MyBuys.com <http://www.mybuys.com/>

    *cid:image001.png@01CDED83.57EED120*

    *From:*Christopher Armstrong [mailto:chris@chrisarmstrong.me
    <mailto:chris@chrisarmstrong.me>]
    *Sent:* Monday, January 13, 2014 4:09 PM
    *To:* chef@lists.opscode.com <mailto:chef@lists.opscode.com>
    *Subject:* [chef] Re: Re: Automated check-ins or not...

    Chef as a tool is used for orchestration, converging nodes to
    a desired state. If your coworker doesn't want nodes checking
    in automatically, then perhaps Chef isn't the ideal tool for
    you. What does your use case look like?

    On Mon, Jan 13, 2014 at 1:05 PM, Ranjib Dey
    <dey.ranjib@gmail.com <mailto:dey.ranjib@gmail.com>> wrote:

        by check in do you mean chef runs or chef registrations.
        I am aware of 3 different ways

        1) on demand: use rundeck, or mco or capistrano like
        tools to invoke chef run. pros: on demand :-), which
        helps if you deploy your application via chef. also you
        can eliminate the need of a validation certificate. cons:
        requires additional tooling, special security
        considerations etc.

        2) as service : specify a splay time, and use the
        standard init scripts to run chef client as service.
        pros:  no additional configuration required, no
        dependency on any other tools. cons: memory leak, stale
        processes used to be a pain.

        3) as a scheduled job : use cron or rufus like system to
        run chef on periodic interval. pros: simple, less prone
        to memory leaks., cons: infra has to be designed as
        evantually consistent, on demand application deployment
        can not be done., additional considerations needed on
        deciding cron times on individual servers, else u'll
        storm the chef server.

        i have used pretty much all three of these. and i think
        all of them has merits. choose any one depending upon
        what you do, how you are doing it and how comfortable you
        are with chef and those tools. most of the issues with
        running chef as service are now sorted (or workarounds
        are known).

        best

        ranjib

        On Mon, Jan 13, 2014 at 12:52 PM, Phillip Roberts
        <proberts@mybuys.com <mailto:proberts@mybuys.com>> wrote:

            I am interested in hearing what others are doing in
            terms of allowing nodes to automatically check in
            with chef or not. It has recently come up as a
            concern with a party in our company, he would prefer
            to not see nodes check in automatically with chef (I
            currently have a cron job that runs chef-client every
            X number of minutes).

            I am just interested in hearing how others manage
            this, I am not certain that I think that manually
            running chef-client is a good solution.

            I am being slightly vague on purpose, because I am
            looking for full case examples from others using chef
            and how they are using it.

            Thanks,

            *Phillip Roberts | Sr. Linux Systems Administrator*

            San Mateo *|* *Ann Arbor |* New York *|* London

            *O*734.922.7014 <tel:734.922.7014> *| C 614.423.9871
            <tel:614.423.9871>* *| *www.MyBuys.com
            <http://www.mybuys.com/>

            *cid:image001.png@01CDED83.57EED120*

#13

Well, Phillip did said “I am being slightly vague on purpose, because I am
looking for full case examples from others using chef and how they are
using it.” :wink:

-Greg

On Tue, Jan 14, 2014 at 8:01 PM, Lamont Granquist lamont@opscode.comwrote:

Yeah, but he’s talking about a more fundamental problem with his
management/co-workers not being okay with the fundamental idea of an
automated job running which might change system config.

You’re off on a completely different planet where you’ve accepted the
basic premise of “DevOps” (for lack of a better term) and its a question
not of “should we do it?” but “how aggressive?” and thats influenced by how
well along the road to continuous integration / continuous deployment you
are, which would be like trying to explain quantum mechanics to a cave man.

On 1/13/14 5:28 PM, Greg Zapp wrote:

My cookbooks hook into our orchestration server via REST calls to pull
down information about which sites should be configured, etc. During POC
build out I had Chef run every minute, but most of my machines are Windows
servers and Chef is very CPU hungry there. We have modified our
orchestration server to set the updated time for the “pool” when any
resource contained in the “pool” is modified. I wrapped Chef in a .Net
app/service that will first check if the pool has been changed since the
last successful Chef run. This is how we chose to mitigate Chef’s CPU
hunger and allow for faster converge times.

-Greg

On Tue, Jan 14, 2014 at 1:56 PM, Lamont Granquist lamont@opscode.comwrote:

Yeah, places that I’ve been where managers have been afraid of config
management (CFengine at the time) running on a schedule has resulted in an
accretion of changes over time, and then once enough changes got queued up
that we had to run it on a server and the change window was scheduled and
it was approved by our CRB board and appropriate offerings were burned to
the gods of ITIL, the changes would often wind up causing outages because
so many changes hit the server and it was hard to determine the impact
ahead of time. But the outages were all contained to change windows and
were approved, so I guess that makes it okay.

A tactic that I’ve used in the past has been to run CM only once per day
and run it with a 12-hour random splay and time it for 8pm-8am. Changes
can be committed during the business day and they don’t immediately take
effect, then they can get tested or pushed out manually. And if anything
goes wrong, it’ll start hitting servers at 8pm and you have a longer window
before it hits your entire infrastructure and more time for you to get
monitoring alerts and stop the changes rolling out. If you just run Chef
every 30 minutes with a 5 minute random splay, then its likely that by the
time your monitoring alerts you and you start taking action that the change
has hit your entire infrastructure. By only doing the “scheduled” runs
once per day you still keep the deltas between runs small, you allow
yourself some time to stop your CM tool before it all rolls out, and you
also reduce the load on your chef server infrastructure (or on our HEC
infrastructure).

The other thing is that if you only run Chef once a week or once a month
on-demand, then you’re not getting the “self-repairing” and SOX/PCI-DSS
"prevent control" features of configuration management. If you’re running
it nightly then any junior SA or malicious attacker that logs into the
server and manually changes the state of critical files will have those
changes immediately rolled back. That produces prevent controls that
auditors really like. That also trains your junior SAs to not make with
the typey-typey on the keyboard and to use the CM program – otherwise they
tend to fall back to old behaviors of making changes on the console and
then its not their fault they did that, its going to be Chef’s fault that
it rolled those changes back when its eventually run and reverts those
changes and the service crashes.

On 1/13/14 1:32 PM, David Petzel wrote:

We had quite a few discussions about this as well and at the end of the
day we opted for the ability to do both on-demand as well as scheduled.
There were concerns that without a scheduled check-in the amount of drift
in systems could become large over time on servers that don’t routinely get
deployments done. With that drift comes a slew of unknown issues. By
enforcing a schedule run we could be sure that hand modified configurations
didn’t stick around very long.

We’ve setup a report to notify us if a node has not checked-in in the
last day. This helps us catch cases where the schedule run might be failing
and other notification mechanisms might not be catching it (it some nasty
compile error super early in the run)

From there we extended an existing in house tool that lets anyone with
access request a chef run without needing access to the servers.

On Mon, Jan 13, 2014 at 4:16 PM, Phillip Roberts proberts@mybuys.comwrote:

The problem isn’t my coworker, the problem is a lack of understanding
the tool.

Chef is my baby, and I am perfectly fine with automated check-in’s,
however, just like any business, there are politics at play. There are
fears due to a lack of understanding as well.

I am purposely asking for others use cases because I am interested in
them to help me form my arguments as to why chef nodes should be checking
in (running chef-client) automatically.

I am not asking for anyone to tell me whether we should be using chef,
or how we should be using chef, I am interested in how it is being used in
other environments. I have seen plenty of other environments where I have
implemented chef, however, in all cases, I have implemented chef and the
policies that surround chef. In all cases, this question has never come up,
or this argument.

I appreciate the responses thus far.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 | C 614.423.9871 <614.423.9871> *| *www.MyBuys.comhttp://www.mybuys.com/

[image: cid:image001.png@01CDED83.57EED120]

From: Christopher Armstrong [mailto:chris@chrisarmstrong.me]
Sent: Monday, January 13, 2014 4:09 PM
To: chef@lists.opscode.com
Subject: [chef] Re: Re: Automated check-ins or not…

Chef as a tool is used for orchestration, converging nodes to a desired
state. If your coworker doesn’t want nodes checking in automatically, then
perhaps Chef isn’t the ideal tool for you. What does your use case look
like?

On Mon, Jan 13, 2014 at 1:05 PM, Ranjib Dey dey.ranjib@gmail.com
wrote:

by check in do you mean chef runs or chef registrations. I am aware of
3 different ways

  1. on demand: use rundeck, or mco or capistrano like tools to invoke
    chef run. pros: on demand :-), which helps if you deploy your application
    via chef. also you can eliminate the need of a validation certificate.
    cons: requires additional tooling, special security considerations etc.

  2. as service : specify a splay time, and use the standard init scripts
    to run chef client as service. pros: no additional configuration required,
    no dependency on any other tools. cons: memory leak, stale processes used
    to be a pain.

  3. as a scheduled job : use cron or rufus like system to run chef on
    periodic interval. pros: simple, less prone to memory leaks., cons: infra
    has to be designed as evantually consistent, on demand application
    deployment can not be done., additional considerations needed on deciding
    cron times on individual servers, else u’ll storm the chef server.

i have used pretty much all three of these. and i think all of them has
merits. choose any one depending upon what you do, how you are doing it and
how comfortable you are with chef and those tools. most of the issues with
running chef as service are now sorted (or workarounds are known).

best

ranjib

On Mon, Jan 13, 2014 at 12:52 PM, Phillip Roberts proberts@mybuys.com
wrote:

I am interested in hearing what others are doing in terms of allowing
nodes to automatically check in with chef or not. It has recently come up
as a concern with a party in our company, he would prefer to not see nodes
check in automatically with chef (I currently have a cron job that runs
chef-client every X number of minutes).

I am just interested in hearing how others manage this, I am not certain
that I think that manually running chef-client is a good solution.

I am being slightly vague on purpose, because I am looking for full case
examples from others using chef and how they are using it.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 | C 614.423.9871 <614.423.9871> *| *www.MyBuys.comhttp://www.mybuys.com/

[image: cid:image001.png@01CDED83.57EED120]


#14

I appreciate everyone’s response to this thread. It has been a pretty good discussion and I have gathered some great information from it.

You both are correct. I however, was most interested in the broader discussion of how others are handling it in their environment.

I wasn’t necessarily looking for “this is how you should do it”, “this is how you should handle your coworker” or anything like that. Just a broader discussion of how each were using it in their environment. It helps me jog some fresh ideas for furthering our implementation here, and helping increase not only my maturity, but my team as a whole.

Here, we have a young team, who is in the process of migrating from being “SysOps” or the “Operations Team” to “Development Operations, DevOps”. There is a bit of culture clash, our engineering team has a ton of talent, but it is older talent, with older more proven processes to how they do things, that many of us would consider antiquated. They are very frightened by the idea of continuous integration, maybe even threatened. Our environment is evolving, and when I joined the team 6 months ago, I came with a deep desire to help them go from using an adhoc deployment perl script to using an automated workflow and true Infrastructure as Code.

Doing so means teaching people who have never written ruby code or worked with chef how to do so, also, teaching people who have never been around continuous integration, to understand continuous integration and test driven infrastructure.

I get looked at like I have a third eye when I say, write a test before you write any other code.

It’s a steep learning curve, I know because I have been through it. I am working to increase the teams knowledge and maturity, but there are going to be bumps along the way.

The suggestion of not running chef-client automatically on our nodes actually infuriated me at first. Instead of popping off a half thought out angry email to our CTO, I decided to take some time to think about it and ask for help from you guys in thinking about it. Again I really appreciate everyone’s involvement in this thread.

Phillip Roberts | Sr. Linux Systems Administrator
San Mateo | Ann Arbor | New York | London
O 734.922.7014 | C 614.423.9871 | www.MyBuys.comhttp://www.mybuys.com/
[cid:image001.png@01CDED83.57EED120]

From: Greg Zapp [mailto:greg.zapp@gmail.com]
Sent: Tuesday, January 14, 2014 2:20 AM
To: chef@lists.opscode.com
Subject: [chef] Re: Re: Re: Re: Re: RE: Re: Re: Automated check-ins or not…

Well, Phillip did said “I am being slightly vague on purpose, because I am looking for full case examples from others using chef and how they are using it.” :wink:

-Greg

On Tue, Jan 14, 2014 at 8:01 PM, Lamont Granquist <lamont@opscode.commailto:lamont@opscode.com> wrote:

Yeah, but he’s talking about a more fundamental problem with his management/co-workers not being okay with the fundamental idea of an automated job running which might change system config.

You’re off on a completely different planet where you’ve accepted the basic premise of “DevOps” (for lack of a better term) and its a question not of “should we do it?” but “how aggressive?” and thats influenced by how well along the road to continuous integration / continuous deployment you are, which would be like trying to explain quantum mechanics to a cave man.

On 1/13/14 5:28 PM, Greg Zapp wrote:
My cookbooks hook into our orchestration server via REST calls to pull down information about which sites should be configured, etc. During POC build out I had Chef run every minute, but most of my machines are Windows servers and Chef is very CPU hungry there. We have modified our orchestration server to set the updated time for the “pool” when any resource contained in the “pool” is modified. I wrapped Chef in a .Net app/service that will first check if the pool has been changed since the last successful Chef run. This is how we chose to mitigate Chef’s CPU hunger and allow for faster converge times.

-Greg

On Tue, Jan 14, 2014 at 1:56 PM, Lamont Granquist <lamont@opscode.commailto:lamont@opscode.com> wrote:

Yeah, places that I’ve been where managers have been afraid of config management (CFengine at the time) running on a schedule has resulted in an accretion of changes over time, and then once enough changes got queued up that we had to run it on a server and the change window was scheduled and it was approved by our CRB board and appropriate offerings were burned to the gods of ITIL, the changes would often wind up causing outages because so many changes hit the server and it was hard to determine the impact ahead of time. But the outages were all contained to change windows and were approved, so I guess that makes it okay.

A tactic that I’ve used in the past has been to run CM only once per day and run it with a 12-hour random splay and time it for 8pm-8am. Changes can be committed during the business day and they don’t immediately take effect, then they can get tested or pushed out manually. And if anything goes wrong, it’ll start hitting servers at 8pm and you have a longer window before it hits your entire infrastructure and more time for you to get monitoring alerts and stop the changes rolling out. If you just run Chef every 30 minutes with a 5 minute random splay, then its likely that by the time your monitoring alerts you and you start taking action that the change has hit your entire infrastructure. By only doing the “scheduled” runs once per day you still keep the deltas between runs small, you allow yourself some time to stop your CM tool before it all rolls out, and you also reduce the load on your chef server infrastructure (or on our HEC infrastructure).

The other thing is that if you only run Chef once a week or once a month on-demand, then you’re not getting the “self-repairing” and SOX/PCI-DSS “prevent control” features of configuration management. If you’re running it nightly then any junior SA or malicious attacker that logs into the server and manually changes the state of critical files will have those changes immediately rolled back. That produces prevent controls that auditors really like. That also trains your junior SAs to not make with the typey-typey on the keyboard and to use the CM program – otherwise they tend to fall back to old behaviors of making changes on the console and then its not their fault they did that, its going to be Chef’s fault that it rolled those changes back when its eventually run and reverts those changes and the service crashes.

On 1/13/14 1:32 PM, David Petzel wrote:
We had quite a few discussions about this as well and at the end of the day we opted for the ability to do both on-demand as well as scheduled. There were concerns that without a scheduled check-in the amount of drift in systems could become large over time on servers that don’t routinely get deployments done. With that drift comes a slew of unknown issues. By enforcing a schedule run we could be sure that hand modified configurations didn’t stick around very long.

We’ve setup a report to notify us if a node has not checked-in in the last day. This helps us catch cases where the schedule run might be failing and other notification mechanisms might not be catching it (it some nasty compile error super early in the run)

From there we extended an existing in house tool that lets anyone with access request a chef run without needing access to the servers.

On Mon, Jan 13, 2014 at 4:16 PM, Phillip Roberts <proberts@mybuys.commailto:proberts@mybuys.com> wrote:
The problem isn’t my coworker, the problem is a lack of understanding the tool.

Chef is my baby, and I am perfectly fine with automated check-in’s, however, just like any business, there are politics at play. There are fears due to a lack of understanding as well.

I am purposely asking for others use cases because I am interested in them to help me form my arguments as to why chef nodes should be checking in (running chef-client) automatically.

I am not asking for anyone to tell me whether we should be using chef, or how we should be using chef, I am interested in how it is being used in other environments. I have seen plenty of other environments where I have implemented chef, however, in all cases, I have implemented chef and the policies that surround chef. In all cases, this question has never come up, or this argument.

I appreciate the responses thus far.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator
San Mateo | Ann Arbor | New York | London
O 734.922.7014tel:734.922.7014 | C 614.423.9871tel:614.423.9871 | www.MyBuys.comhttp://www.mybuys.com/
[cid:image001.png@01CDED83.57EED120]

From: Christopher Armstrong [mailto:chris@chrisarmstrong.memailto:chris@chrisarmstrong.me]
Sent: Monday, January 13, 2014 4:09 PM
To: chef@lists.opscode.commailto:chef@lists.opscode.com
Subject: [chef] Re: Re: Automated check-ins or not…

Chef as a tool is used for orchestration, converging nodes to a desired state. If your coworker doesn’t want nodes checking in automatically, then perhaps Chef isn’t the ideal tool for you. What does your use case look like?

On Mon, Jan 13, 2014 at 1:05 PM, Ranjib Dey <dey.ranjib@gmail.commailto:dey.ranjib@gmail.com> wrote:
by check in do you mean chef runs or chef registrations. I am aware of 3 different ways

  1. on demand: use rundeck, or mco or capistrano like tools to invoke chef run. pros: on demand :-), which helps if you deploy your application via chef. also you can eliminate the need of a validation certificate. cons: requires additional tooling, special security considerations etc.

  2. as service : specify a splay time, and use the standard init scripts to run chef client as service. pros: no additional configuration required, no dependency on any other tools. cons: memory leak, stale processes used to be a pain.

  3. as a scheduled job : use cron or rufus like system to run chef on periodic interval. pros: simple, less prone to memory leaks., cons: infra has to be designed as evantually consistent, on demand application deployment can not be done., additional considerations needed on deciding cron times on individual servers, else u’ll storm the chef server.

i have used pretty much all three of these. and i think all of them has merits. choose any one depending upon what you do, how you are doing it and how comfortable you are with chef and those tools. most of the issues with running chef as service are now sorted (or workarounds are known).

best
ranjib

On Mon, Jan 13, 2014 at 12:52 PM, Phillip Roberts <proberts@mybuys.commailto:proberts@mybuys.com> wrote:
I am interested in hearing what others are doing in terms of allowing nodes to automatically check in with chef or not. It has recently come up as a concern with a party in our company, he would prefer to not see nodes check in automatically with chef (I currently have a cron job that runs chef-client every X number of minutes).

I am just interested in hearing how others manage this, I am not certain that I think that manually running chef-client is a good solution.

I am being slightly vague on purpose, because I am looking for full case examples from others using chef and how they are using it.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator
San Mateo | Ann Arbor | New York | London
O 734.922.7014tel:734.922.7014 | C 614.423.9871tel:614.423.9871 | www.MyBuys.comhttp://www.mybuys.com/
[cid:image001.png@01CDED83.57EED120]


#15

Hey Phillip,
I’ve worked at/with several companies that have used Chef and using the
chef-client cookbook to configure chef-client to run every X number of
minutes as a cron or a service or once an hour/day/week/month/year is
pretty much the standard and you are likely to get a pretty distributed
sample from the community.

Like you said, the greatest barrier to anything new is FUD. I think in your
case the best way forward is through education but to answer your question
here are a few work arounds I have seen in my day.

A very small number of teams I have worked with have decided to change
their run_list to execute only operations roles after the server has been
setup correctly. You could have a role called web_server_setup with all the
recipes to setup/install your web server then another role called
web_server_ops that lacks the recipes to install components and instead has
recipes to maintain your web server like process maintenance, monitoring,
etc. If you truly embrace idempotence this is unnecessary but it has been
successful in winning over some nay-sayers who were unconvinced I wasn’t
going to be constantly installing new things in prod. That being said, I
feel dirty for simply recommending it.

When you want to run chef manually, I have seen and recommended the use of
Chef-specific tools like Push Jobs
Serverhttp://docs.opscode.com/push_jobs.html or
knife exec http://docs.opscode.com/knife_exec.html to execute chef-client
on a large number of servers automatically based simply on a search query
(role = X and environment = Y and zodiac_sign = Z). I have done/seen this
in situations not only where chef-client is being run manually but also
when Chef is running automatically to avoid waiting for the next run (i.e.
coordinated deployments, scheduled changes, etc).

Best of luck!

Tom Duffield — Automation Consulting Engineer

651.769.7497 – tom@getchef.com – *my:
*Linkedinhttp://www.linkedin.com/in/thomasduffield
Twitter http://www.twitter.com/tomduffield
CHEF

GETCHEF.COM http://www.getchef.com/

TM

getchef.com http://www.getchef.com/ Bloghttp://www.opscode.com/blog/
Facebook https://www.facebook.com/getchefdotcom
Twitterhttps://twitter.com/getchefdotcom
Youtube https://www.youtube.com/getchef

Meet me at #ChefConf 2014 http://chefconf.com/

On Tue, Jan 14, 2014 at 9:42 AM, Phillip Roberts proberts@mybuys.comwrote:

I appreciate everyone’s response to this thread. It has been a pretty
good discussion and I have gathered some great information from it.

You both are correct. I however, was most interested in the broader
discussion of how others are handling it in their environment.

I wasn’t necessarily looking for “this is how you should do it”, “this is
how you should handle your coworker” or anything like that. Just a broader
discussion of how each were using it in their environment. It helps me jog
some fresh ideas for furthering our implementation here, and helping
increase not only my maturity, but my team as a whole.

Here, we have a young team, who is in the process of migrating from being
“SysOps” or the “Operations Team” to “Development Operations, DevOps”.
There is a bit of culture clash, our engineering team has a ton of talent,
but it is older talent, with older more proven processes to how they do
things, that many of us would consider antiquated. They are very frightened
by the idea of continuous integration, maybe even threatened. Our
environment is evolving, and when I joined the team 6 months ago, I came
with a deep desire to help them go from using an adhoc deployment perl
script to using an automated workflow and true Infrastructure as Code.

Doing so means teaching people who have never written ruby code or worked
with chef how to do so, also, teaching people who have never been around
continuous integration, to understand continuous integration and test
driven infrastructure.

I get looked at like I have a third eye when I say, write a test before
you write any other code.

It’s a steep learning curve, I know because I have been through it. I am
working to increase the teams knowledge and maturity, but there are going
to be bumps along the way.

The suggestion of not running chef-client automatically on our nodes
actually infuriated me at first. Instead of popping off a half thought out
angry email to our CTO, I decided to take some time to think about it and
ask for help from you guys in thinking about it. Again I really appreciate
everyone’s involvement in this thread.

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 | C 614.423.9871 <614.423.9871> *| *www.MyBuys.comhttp://www.mybuys.com/

[image: cid:image001.png@01CDED83.57EED120]

From: Greg Zapp [mailto:greg.zapp@gmail.com]
Sent: Tuesday, January 14, 2014 2:20 AM
To: chef@lists.opscode.com
Subject: [chef] Re: Re: Re: Re: Re: RE: Re: Re: Automated check-ins or
not…

Well, Phillip did said “I am being slightly vague on purpose, because I
am looking for full case examples from others using chef and how they are
using it.” :wink:

-Greg

On Tue, Jan 14, 2014 at 8:01 PM, Lamont Granquist lamont@opscode.com
wrote:

Yeah, but he’s talking about a more fundamental problem with his
management/co-workers not being okay with the fundamental idea of an
automated job running which might change system config.

You’re off on a completely different planet where you’ve accepted the
basic premise of “DevOps” (for lack of a better term) and its a question
not of “should we do it?” but “how aggressive?” and thats influenced by how
well along the road to continuous integration / continuous deployment you
are, which would be like trying to explain quantum mechanics to a cave man.

On 1/13/14 5:28 PM, Greg Zapp wrote:

My cookbooks hook into our orchestration server via REST calls to pull
down information about which sites should be configured, etc. During POC
build out I had Chef run every minute, but most of my machines are Windows
servers and Chef is very CPU hungry there. We have modified our
orchestration server to set the updated time for the “pool” when any
resource contained in the “pool” is modified. I wrapped Chef in a .Net
app/service that will first check if the pool has been changed since the
last successful Chef run. This is how we chose to mitigate Chef’s CPU
hunger and allow for faster converge times.

-Greg

On Tue, Jan 14, 2014 at 1:56 PM, Lamont Granquist lamont@opscode.com
wrote:

Yeah, places that I’ve been where managers have been afraid of config
management (CFengine at the time) running on a schedule has resulted in an
accretion of changes over time, and then once enough changes got queued up
that we had to run it on a server and the change window was scheduled and
it was approved by our CRB board and appropriate offerings were burned to
the gods of ITIL, the changes would often wind up causing outages because
so many changes hit the server and it was hard to determine the impact
ahead of time. But the outages were all contained to change windows and
were approved, so I guess that makes it okay.

A tactic that I’ve used in the past has been to run CM only once per day
and run it with a 12-hour random splay and time it for 8pm-8am. Changes
can be committed during the business day and they don’t immediately take
effect, then they can get tested or pushed out manually. And if anything
goes wrong, it’ll start hitting servers at 8pm and you have a longer window
before it hits your entire infrastructure and more time for you to get
monitoring alerts and stop the changes rolling out. If you just run Chef
every 30 minutes with a 5 minute random splay, then its likely that by the
time your monitoring alerts you and you start taking action that the change
has hit your entire infrastructure. By only doing the “scheduled” runs
once per day you still keep the deltas between runs small, you allow
yourself some time to stop your CM tool before it all rolls out, and you
also reduce the load on your chef server infrastructure (or on our HEC
infrastructure).

The other thing is that if you only run Chef once a week or once a month
on-demand, then you’re not getting the “self-repairing” and SOX/PCI-DSS
"prevent control" features of configuration management. If you’re running
it nightly then any junior SA or malicious attacker that logs into the
server and manually changes the state of critical files will have those
changes immediately rolled back. That produces prevent controls that
auditors really like. That also trains your junior SAs to not make with
the typey-typey on the keyboard and to use the CM program – otherwise they
tend to fall back to old behaviors of making changes on the console and
then its not their fault they did that, its going to be Chef’s fault that
it rolled those changes back when its eventually run and reverts those
changes and the service crashes.

On 1/13/14 1:32 PM, David Petzel wrote:

We had quite a few discussions about this as well and at the end of the
day we opted for the ability to do both on-demand as well as scheduled.
There were concerns that without a scheduled check-in the amount of drift
in systems could become large over time on servers that don’t routinely get
deployments done. With that drift comes a slew of unknown issues. By
enforcing a schedule run we could be sure that hand modified configurations
didn’t stick around very long.

We’ve setup a report to notify us if a node has not checked-in in the last
day. This helps us catch cases where the schedule run might be failing and
other notification mechanisms might not be catching it (it some nasty
compile error super early in the run)

From there we extended an existing in house tool that lets anyone with
access request a chef run without needing access to the servers.

On Mon, Jan 13, 2014 at 4:16 PM, Phillip Roberts proberts@mybuys.com
wrote:

The problem isn’t my coworker, the problem is a lack of understanding
the tool.

Chef is my baby, and I am perfectly fine with automated check-in’s,
however, just like any business, there are politics at play. There are
fears due to a lack of understanding as well.

I am purposely asking for others use cases because I am interested in them
to help me form my arguments as to why chef nodes should be checking in
(running chef-client) automatically.

I am not asking for anyone to tell me whether we should be using chef, or
how we should be using chef, I am interested in how it is being used in
other environments. I have seen plenty of other environments where I have
implemented chef, however, in all cases, I have implemented chef and the
policies that surround chef. In all cases, this question has never come up,
or this argument.

I appreciate the responses thus far.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 *| C *614.423.9871 <614.423.9871> *| *www.MyBuys.comhttp://www.mybuys.com/

[image: cid:image001.png@01CDED83.57EED120]

From: Christopher Armstrong [mailto:chris@chrisarmstrong.me]
Sent: Monday, January 13, 2014 4:09 PM
To: chef@lists.opscode.com
Subject: [chef] Re: Re: Automated check-ins or not…

Chef as a tool is used for orchestration, converging nodes to a desired
state. If your coworker doesn’t want nodes checking in automatically, then
perhaps Chef isn’t the ideal tool for you. What does your use case look
like?

On Mon, Jan 13, 2014 at 1:05 PM, Ranjib Dey dey.ranjib@gmail.com wrote:

by check in do you mean chef runs or chef registrations. I am aware of 3
different ways

  1. on demand: use rundeck, or mco or capistrano like tools to invoke chef
    run. pros: on demand :-), which helps if you deploy your application via
    chef. also you can eliminate the need of a validation certificate. cons:
    requires additional tooling, special security considerations etc.

  2. as service : specify a splay time, and use the standard init scripts to
    run chef client as service. pros: no additional configuration required, no
    dependency on any other tools. cons: memory leak, stale processes used to
    be a pain.

  3. as a scheduled job : use cron or rufus like system to run chef on
    periodic interval. pros: simple, less prone to memory leaks., cons: infra
    has to be designed as evantually consistent, on demand application
    deployment can not be done., additional considerations needed on deciding
    cron times on individual servers, else u’ll storm the chef server.

i have used pretty much all three of these. and i think all of them has
merits. choose any one depending upon what you do, how you are doing it and
how comfortable you are with chef and those tools. most of the issues with
running chef as service are now sorted (or workarounds are known).

best

ranjib

On Mon, Jan 13, 2014 at 12:52 PM, Phillip Roberts proberts@mybuys.com
wrote:

I am interested in hearing what others are doing in terms of allowing
nodes to automatically check in with chef or not. It has recently come up
as a concern with a party in our company, he would prefer to not see nodes
check in automatically with chef (I currently have a cron job that runs
chef-client every X number of minutes).

I am just interested in hearing how others manage this, I am not certain
that I think that manually running chef-client is a good solution.

I am being slightly vague on purpose, because I am looking for full case
examples from others using chef and how they are using it.

Thanks,

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 *| C *614.423.9871 <614.423.9871> *| *www.MyBuys.comhttp://www.mybuys.com/

[image: cid:image001.png@01CDED83.57EED120]


#16

One tool that you have with chef is --why-run mode. You can use this to
prove that most of the time the chef-client runs are doing nothing.
We’ve also got a reporting feature for HEC that reports on changed
resources. As long as your resources are properly idempotent then your
daemonized chef-client runs should all be NOPs and no change should
occur. If neither of those options work for you, you can always write
your own reporting handler and write code that walks the chef resource
collection and extracts the information on what resources have changed.
That can go a long way towards addressing FUD about having a process on
the box making scary changes.

There’s only two ways that change happens outside of reviewed change
then. One way is that someone changes a box manually and then the
chef-client resets that change:

The other way is that your promotion to production workflow might be
hard to follow and someone could make a mistake there, so you need to
focus on getting that correct, and define the technical and business
process that results in changes getting into production.

On 1/14/14 7:42 AM, Phillip Roberts wrote:

I appreciate everyone’s response to this thread. It has been a pretty
good discussion and I have gathered some great information from it.

You both are correct. I however, was most interested in the broader
discussion of how others are handling it in their environment.

I wasn’t necessarily looking for “this is how you should do it”, “this
is how you should handle your coworker” or anything like that. Just a
broader discussion of how each were using it in their environment. It
helps me jog some fresh ideas for furthering our implementation here,
and helping increase not only my maturity, but my team as a whole.

Here, we have a young team, who is in the process of migrating from
being “SysOps” or the “Operations Team” to “Development Operations,
DevOps”. There is a bit of culture clash, our engineering team has a
ton of talent, but it is older talent, with older more proven
processes to how they do things, that many of us would consider
antiquated. They are very frightened by the idea of continuous
integration, maybe even threatened. Our environment is evolving, and
when I joined the team 6 months ago, I came with a deep desire to help
them go from using an adhoc deployment perl script to using an
automated workflow and true Infrastructure as Code.

Doing so means teaching people who have never written ruby code or
worked with chef how to do so, also, teaching people who have never
been around continuous integration, to understand continuous
integration and test driven infrastructure.

I get looked at like I have a third eye when I say, write a test
before you write any other code.

It’s a steep learning curve, I know because I have been through it. I
am working to increase the teams knowledge and maturity, but there are
going to be bumps along the way.

The suggestion of not running chef-client automatically on our nodes
actually infuriated me at first. Instead of popping off a half thought
out angry email to our CTO, I decided to take some time to think about
it and ask for help from you guys in thinking about it. Again I really
appreciate everyone’s involvement in this thread.

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O734.922.7014 | C 614.423.9871 *| *www.MyBuys.com
http://www.mybuys.com/

cid:image001.png@01CDED83.57EED120

*From:*Greg Zapp [mailto:greg.zapp@gmail.com]
Sent: Tuesday, January 14, 2014 2:20 AM
To: chef@lists.opscode.com
Subject: [chef] Re: Re: Re: Re: Re: RE: Re: Re: Automated check-ins
or not…

Well, Phillip did said “I am being slightly vague on purpose, because
I am looking for full case examples from others using chef and how
they are using it.” :wink:

-Greg

On Tue, Jan 14, 2014 at 8:01 PM, Lamont Granquist <lamont@opscode.com
mailto:lamont@opscode.com> wrote:

Yeah, but he's talking about a more fundamental problem with his
management/co-workers not being okay with the fundamental idea of
an automated job running which might change system config.

You're off on a completely different planet where you've accepted
the basic premise of "DevOps" (for lack of a better term) and its
a question not of "should we do it?" but "how aggressive?" and
thats influenced by how well along the road to continuous
integration / continuous deployment you are, which would be like
trying to explain quantum mechanics to a cave man.



On 1/13/14 5:28 PM, Greg Zapp wrote:

    My cookbooks hook into our orchestration server via REST calls
    to pull down information about which sites should be
    configured, etc.  During POC build out I had Chef run every
    minute, but most of my machines are Windows servers and Chef
    is very CPU hungry there.  We have modified our orchestration
    server to set the updated time for the "pool" when any
    resource contained in the "pool" is modified.  I wrapped Chef
    in a .Net app/service that will first check if the pool has
    been changed since the last successful Chef run.  This is how
    we chose to mitigate Chef's CPU hunger and allow for faster
    converge times.

    -Greg

    On Tue, Jan 14, 2014 at 1:56 PM, Lamont Granquist
    <lamont@opscode.com <mailto:lamont@opscode.com>> wrote:


        Yeah, places that I've been where managers have been
        afraid of config management (CFengine at the time) running
        on a schedule has resulted in an accretion of changes over
        time, and then once enough changes got queued up that we
        had to run it on a server and the change window was
        scheduled and it was approved by our CRB board and
        appropriate offerings were burned to the gods of ITIL, the
        changes would often wind up causing outages because so
        many changes hit the server and it was hard to determine
        the impact ahead of time.  But the outages were all
        contained to change windows and were approved, so I guess
        that makes it okay.

        A tactic that I've used in the past has been to run CM
        only once per day and run it with a 12-hour random splay
        and time it for 8pm-8am.  Changes can be committed during
        the business day and they don't immediately take effect,
        then they can get tested or pushed out manually.  And if
        anything goes wrong, it'll start hitting servers at 8pm
        and you have a longer window before it hits your entire
        infrastructure and more time for you to get monitoring
        alerts and stop the changes rolling out.  If you just run
        Chef every 30 minutes with a 5 minute random splay, then
        its likely that by the time your monitoring alerts you and
        you start taking action that the change has hit your
        entire infrastructure.  By only doing the "scheduled" runs
        once per day you still keep the deltas between runs small,
        you allow yourself some time to stop your CM tool before
        it all rolls out, and you also reduce the load on your
        chef server infrastructure (or on our HEC infrastructure).

        The other thing is that if you only run Chef once a week
        or once a month on-demand, then you're not getting the
        "self-repairing" and SOX/PCI-DSS "prevent control"
        features of configuration management.  If you're running
        it nightly then any junior SA or malicious attacker that
        logs into the server and manually changes the state of
        critical files will have those changes immediately rolled
        back.  That produces prevent controls that auditors really
        like.  That also trains your junior SAs to not make with
        the typey-typey on the keyboard and to use the CM program
        -- otherwise they tend to fall back to old behaviors of
        making changes on the console and then its not their fault
        they did that, its going to be Chef's fault that it rolled
        those changes back when its eventually run and reverts
        those changes and the service crashes.



        On 1/13/14 1:32 PM, David Petzel wrote:

            We had quite a few discussions about this as well and
            at the end of the day we opted for the ability to do
            both on-demand as well as scheduled. There were
            concerns that without a scheduled check-in the amount
            of drift in systems could become large over time on
            servers that don't routinely get deployments done.
            With that drift comes a slew of unknown issues. By
            enforcing a schedule run we could be sure that hand
            modified configurations didn't stick around very long.

            We've setup a report to notify us if a node has not
            checked-in in the last day. This helps us catch cases
            where the schedule run might be failing and other
            notification mechanisms might not be catching it (it
            some nasty compile error super early in the run)

            From there we extended an existing in house tool that
            lets anyone with access request a chef run without
            needing access to the servers.

            On Mon, Jan 13, 2014 at 4:16 PM, Phillip Roberts
            <proberts@mybuys.com <mailto:proberts@mybuys.com>> wrote:

                The problem isn't my coworker, the problem is a
                lack of understanding the tool.

                Chef is my baby, and I am perfectly fine with
                automated check-in's, however, just like any
                business, there are politics at play. There are
                fears due to a lack of understanding as well.

                I am purposely asking for others use cases because
                I am interested in them to help me form my
                arguments as to why chef nodes should be checking
                in (running chef-client) automatically.

                I am not asking for anyone to tell me whether we
                should be using chef, or how we should be using
                chef, I am interested in how it is being used in
                other environments. I have seen plenty of other
                environments where I have implemented chef,
                however, in all cases, I have implemented chef and
                the policies that surround chef. In all cases,
                this question has never come up, or this argument.

                I appreciate the responses thus far.

                Thanks,

                *Phillip Roberts | Sr. Linux Systems Administrator*

                San Mateo *|* *Ann Arbor |* New York *|* London

                *O*734.922.7014 <tel:734.922.7014>*| C
                **614.423.9871* <tel:614.423.9871>*|
                *www.MyBuys.com <http://www.mybuys.com/>

                *cid:image001.png@01CDED83.57EED120*

                *From:*Christopher Armstrong
                [mailto:chris@chrisarmstrong.me
                <mailto:chris@chrisarmstrong.me>]
                *Sent:* Monday, January 13, 2014 4:09 PM
                *To:* chef@lists.opscode.com
                <mailto:chef@lists.opscode.com>
                *Subject:* [chef] Re: Re: Automated check-ins or
                not...

                Chef as a tool is used for orchestration,
                converging nodes to a desired state. If your
                coworker doesn't want nodes checking in
                automatically, then perhaps Chef isn't the ideal
                tool for you. What does your use case look like?

                On Mon, Jan 13, 2014 at 1:05 PM, Ranjib Dey
                <dey.ranjib@gmail.com
                <mailto:dey.ranjib@gmail.com>> wrote:

                    by check in do you mean chef runs or chef
                    registrations. I am aware of 3 different ways

                    1) on demand: use rundeck, or mco or
                    capistrano like tools to invoke chef run.
                    pros: on demand :-), which helps if you deploy
                    your application via chef. also you can
                    eliminate the need of a validation
                    certificate. cons: requires additional
                    tooling, special security considerations etc.

                    2) as service : specify a splay time, and use
                    the standard init scripts to run chef client
                    as service. pros:  no additional configuration
                    required, no dependency on any other tools.
                    cons: memory leak, stale processes used to be
                    a pain.

                    3) as a scheduled job : use cron or rufus like
                    system to run chef on periodic interval. pros:
                    simple, less prone to memory leaks., cons:
                    infra has to be designed as evantually
                    consistent, on demand application deployment
                    can not be done., additional considerations
                    needed on deciding cron times on individual
                    servers, else u'll storm the chef server.

                    i have used pretty much all three of these.
                    and i think all of them has merits. choose any
                    one depending upon what you do, how you are
                    doing it and how comfortable you are with chef
                    and those tools. most of the issues with
                    running chef as service are now sorted (or
                    workarounds are known).

                    best

                    ranjib

                    On Mon, Jan 13, 2014 at 12:52 PM, Phillip
                    Roberts <proberts@mybuys.com
                    <mailto:proberts@mybuys.com>> wrote:

                        I am interested in hearing what others are
                        doing in terms of allowing nodes to
                        automatically check in with chef or not.
                        It has recently come up as a concern with
                        a party in our company, he would prefer to
                        not see nodes check in automatically with
                        chef (I currently have a cron job that
                        runs chef-client every X number of minutes).

                        I am just interested in hearing how others
                        manage this, I am not certain that I think
                        that manually running chef-client is a
                        good solution.

                        I am being slightly vague on purpose,
                        because I am looking for full case
                        examples from others using chef and how
                        they are using it.

                        Thanks,

                        *Phillip Roberts | Sr. Linux Systems
                        Administrator*

                        San Mateo *|* *Ann Arbor |* New York *|*
                        London

                        *O*734.922.7014 <tel:734.922.7014>*| C
                        **614.423.9871* <tel:614.423.9871>*|
                        *www.MyBuys.com <http://www.mybuys.com/>

                        *cid:image001.png@01CDED83.57EED120*