SCM for node definitions?

How are folks approaching SCM for node definitions? Give up?

u mean the node object?

we dont check them in scm. but we do backup api client & node objects (and
sync them against a mirror chef server).

On Tue, Aug 13, 2013 at 11:49 AM, Jeff Blaine jblaine@kickflop.net wrote:

How are folks approaching SCM for node definitions? Give up?

On 8/13/2013 2:51 PM, Ranjib Dey wrote:

u mean the node object?

Yes. We're a "lab" type of environment with a wide array of node types
and configurations. It is a rarity for any of our servers to have the
exact same configuration as another. Fun...

I'm having uneasy feelings lately about everything except our node
code/data being in files under SCM. It's a break from convention, an
unnecessary(???) anomaly for other less experienced staff to remember, etc.

I guess one could download all of the current nodes' JSON data into
files, check them all in, and somehow disable 'knife node edit' in favor
of 'knife node from file'?

Anyone doing this?

we dont check them in scm. but we do backup api client & node objects
(and sync them against a mirror chef server).

On Tue, Aug 13, 2013 at 11:49 AM, Jeff Blaine <jblaine@kickflop.net
mailto:jblaine@kickflop.net> wrote:

How are folks approaching SCM for node definitions? Give up?

This was literally just talked about 2 days ago on this here mailing list.

On Tue, Aug 13, 2013 at 12:08 PM, Jeff Blaine jblaine@kickflop.net wrote:

On 8/13/2013 2:51 PM, Ranjib Dey wrote:

u mean the node object?

Yes. We're a "lab" type of environment with a wide array of node types and
configurations. It is a rarity for any of our servers to have the exact
same configuration as another. Fun...

I'm having uneasy feelings lately about everything except our node
code/data being in files under SCM. It's a break from convention, an
unnecessary(???) anomaly for other less experienced staff to remember, etc.

I guess one could download all of the current nodes' JSON data into files,
check them all in, and somehow disable 'knife node edit' in favor of 'knife
node from file'?

Anyone doing this?

we dont check them in scm. but we do backup api client & node objects
(and sync them against a mirror chef server).

On Tue, Aug 13, 2013 at 11:49 AM, Jeff Blaine <jblaine@kickflop.net<mailto:
jblaine@kickflop.net>> wrote:

How are folks approaching SCM for node definitions? Give up?

On Tue, 13 Aug 2013 15:08:31 -0400 Jeff Blaine jblaine@kickflop.net
wrote:

On 8/13/2013 2:51 PM, Ranjib Dey wrote:

u mean the node object?

Yes. We're a "lab" type of environment with a wide array of node
types and configurations. It is a rarity for any of our servers to
have the exact same configuration as another. Fun...

I'm having uneasy feelings lately about everything except our node
code/data being in files under SCM. It's a break from convention, an
unnecessary(???) anomaly for other less experienced staff to
remember, etc.

First, you surely do have a restorable backup of your chef server?

Second, while we don't define our hosts from files from the scm (some
people make their jenkins define nodes/roles/environments from
scm-checkout after automated tests pass), we do store an irregular dump
of all nodes/roles/environments/databags as json into scm. Sometimes
its nice to see how you defined your machines two months ago before
everything broke;-)

Have fun,

Arnold

On 8/13/2013 3:19 PM, Ben Hines wrote:

This was literally just talked about 2 days ago on this here mailing list.

I'll read the archives then. Sorry!

On Tue, Aug 13, 2013 at 12:08 PM, Jeff Blaine <jblaine@kickflop.net
mailto:jblaine@kickflop.net> wrote:

On 8/13/2013 2:51 PM, Ranjib Dey wrote:

    u mean the node object?


Yes. We're a "lab" type of environment with a wide array of node
types and configurations. It is a rarity for any of our servers to
have the exact same configuration as another. Fun...

I'm having uneasy feelings lately about everything *except* our
node code/data being in files under SCM. It's a break from
convention, an unnecessary(???) anomaly for other less experienced
staff to remember, etc.

I guess one could download all of the current nodes' JSON data
into files, check them all in, and somehow disable 'knife node
edit' in favor of 'knife node from file'?

Anyone doing this?


    we dont check them in scm. but we do backup api client & node
    objects (and sync them against a mirror chef server).


    On Tue, Aug 13, 2013 at 11:49 AM, Jeff Blaine
    <jblaine@kickflop.net <mailto:jblaine@kickflop.net>
    <mailto:jblaine@kickflop.net <mailto:jblaine@kickflop.net>>>
    wrote:

        How are folks approaching SCM for node definitions? Give up?

On 8/13/13 12:08 PM, Jeff Blaine wrote:

On 8/13/2013 2:51 PM, Ranjib Dey wrote:

u mean the node object?

Yes. We're a "lab" type of environment with a wide array of node types
and configurations. It is a rarity for any of our servers to have the
exact same configuration as another. Fun...

I'm having uneasy feelings lately about everything except our node
code/data being in files under SCM. It's a break from convention, an
unnecessary(???) anomaly for other less experienced staff to remember,
etc.

I guess one could download all of the current nodes' JSON data into
files, check them all in, and somehow disable 'knife node edit' in
favor of 'knife node from file'?

Anyone doing this?

That data isn't really code, its a row in a database. I can see the
value of periodically dumping the node data to SCM and storing it as
history, but people don't normally try to manage database tables with
SCM. If you start trying to limit what can write to nodes and data bags
you start to unnecessarily cripple your ability to use the chef database
as a dynamic CMDB. And IMO there is some unmined potential in treating
databags and nodes as simple database tables and pushing data in them
from other sources in the enterprise, and to look beyond only pushing to
them from SCM. You can try to push the SCM tooling as far as possible,
and maybe I'm wrong about this and if we only took SCM /really/
seriously we'd discover something sublime about how to manage servers
(e.g. being able to git push directly to a chef server), but I come from
having managed servers using classical CMDBs and found them very
powerful and trying to manage CMDBs entirely with SCM seems
unnecessarily crippling to me... You don't typically dump your LDAP or
AD databases into SCM. Nobody dumps their customer databases into SCM.
In the networking world rancid pushes Cisco configs into SCM, but its to
maintain a history and log of changes, not to manage networking gear
with SCM.

I'm currently trying to untangle this question for my own organization. If
the mantra "configuration as code" is really true, shouldn't everything
(that isn't true dynamic/discoverable data, like ohai) be "code" and under
SCM?

On Tue, Aug 13, 2013 at 6:10 PM, Lamont Granquist lamont@opscode.comwrote:

On 8/13/13 12:08 PM, Jeff Blaine wrote:

On 8/13/2013 2:51 PM, Ranjib Dey wrote:

u mean the node object?

Yes. We're a "lab" type of environment with a wide array of node types
and configurations. It is a rarity for any of our servers to have the exact
same configuration as another. Fun...

I'm having uneasy feelings lately about everything except our node
code/data being in files under SCM. It's a break from convention, an
unnecessary(???) anomaly for other less experienced staff to remember, etc.

I guess one could download all of the current nodes' JSON data into
files, check them all in, and somehow disable 'knife node edit' in favor of
'knife node from file'?

Anyone doing this?

That data isn't really code, its a row in a database. I can see the value
of periodically dumping the node data to SCM and storing it as history, but
people don't normally try to manage database tables with SCM. If you start
trying to limit what can write to nodes and data bags you start to
unnecessarily cripple your ability to use the chef database as a dynamic
CMDB. And IMO there is some unmined potential in treating databags and
nodes as simple database tables and pushing data in them from other sources
in the enterprise, and to look beyond only pushing to them from SCM. You
can try to push the SCM tooling as far as possible, and maybe I'm wrong
about this and if we only took SCM /really/ seriously we'd discover
something sublime about how to manage servers (e.g. being able to git push
directly to a chef server), but I come from having managed servers using
classical CMDBs and found them very powerful and trying to manage CMDBs
entirely with SCM seems unnecessarily crippling to me... You don't
typically dump your LDAP or AD databases into SCM. Nobody dumps their
customer databases into SCM. In the networking world rancid pushes Cisco
configs into SCM, but its to maintain a history and log of changes, not to
manage networking gear with SCM.

I find that limiting because then you have to be a dev in order to push
SCM updates, and you're going to run into SOX and PCI issues sooner or
later as you try to expand who has the ability to push config updates.
If your configuration is also represented in a database then you can use
RBAC in order to control who can update those roles and modify config.
People could be writing dashboards that write to dynamic databags in
order to change server access control (add/remove users), and to affect
workflow (approve deployment to prod) and other ways to hook in 'IT
Governance' dynamically into Chef, beyond having the people with commit
access to the repo being the gateway to all configuration change.

I don't much care for mantras. I don't think adopting a database-model
is in conflict with the intent of that mantra, since the data in the
database is being translated into configuration change on the servers by
code. So its still code doing the work, we're not logging into the
servers and making with the typey-typey. But, if its in conflict then
someone needs to explain how the hardcore-SCM-centric worldview produces
utility and gets the job done better. I tend to see it as not scaling
out well beyond this developer-centric IT model. Databases and SCM are
tools, but they have different purposes.

On 8/13/13 5:28 PM, Benjamin Bytheway wrote:

I'm currently trying to untangle this question for my own
organization. If the mantra "configuration as code" is really true,
shouldn't everything (that isn't true dynamic/discoverable data,
like ohai) be "code" and under SCM?

On Tue, Aug 13, 2013 at 6:10 PM, Lamont Granquist <lamont@opscode.com
mailto:lamont@opscode.com> wrote:

On 8/13/13 12:08 PM, Jeff Blaine wrote:

    On 8/13/2013 2:51 PM, Ranjib Dey wrote:

        u mean the node object?


    Yes. We're a "lab" type of environment with a wide array of
    node types and configurations. It is a rarity for any of our
    servers to have the exact same configuration as another. Fun...

    I'm having uneasy feelings lately about everything *except*
    our node code/data being in files under SCM. It's a break from
    convention, an unnecessary(???) anomaly for other less
    experienced staff to remember, etc.

    I guess one could download all of the current nodes' JSON data
    into files, check them all in, and somehow disable 'knife node
    edit' in favor of 'knife node from file'?

    Anyone doing this?


That data isn't really code, its a row in a database.  I can see
the value of periodically dumping the node data to SCM and storing
it as history, but people don't normally try to manage database
tables with SCM.  If you start trying to limit what can write to
nodes and data bags you start to unnecessarily cripple your
ability to use the chef database as a dynamic CMDB.  And IMO there
is some unmined potential in treating databags and nodes as simple
database tables and pushing data in them from other sources in the
enterprise, and to look beyond only pushing to them from SCM.  You
can try to push the SCM tooling as far as possible, and maybe I'm
wrong about this and if we only took SCM /really/ seriously we'd
discover something sublime about how to manage servers (e.g. being
able to git push directly to a chef server), but I come from
having managed servers using classical CMDBs and found them very
powerful and trying to manage CMDBs entirely with SCM seems
unnecessarily crippling to me...  You don't typically dump your
LDAP or AD databases into SCM. Nobody dumps their customer
databases into SCM.  In the networking world rancid pushes Cisco
configs into SCM, but its to maintain a history and log of
changes, not to manage networking gear with SCM.

That pretty much sums up the thought process we've been going through. It
is tempting to have everything in one place (the chef repo), with full
version control, history, etc. But it sure puts a huge barrier to entry up
to even the most minor of changes. And getting configuration changes into

We've gone back and forth on driving configuration through lwrp +
application cookbooks, or putting configuration in data bags, or creating a
separate configuration database with a REST api that chef cookbooks query
to get their marching orders in laying down the right components.

All have their strengths and drawbacks. We've had a hard time seeing a way
forward. One issue is that we have a huge number of apps that currently
run on a much smaller number of servers, with many apps stacked deep on
each cluster. This makes nearly every server different in fundamental
ways. There is much that is common as far as the larger components
(weblogic, apache, jvm), but after that the variation is enormous.

Seems like there are as many ways to do this as there are organizations
using chef.

-Ben

On Tue, Aug 13, 2013 at 7:09 PM, Lamont Granquist lamont@opscode.comwrote:

I find that limiting because then you have to be a dev in order to push
SCM updates, and you're going to run into SOX and PCI issues sooner or
later as you try to expand who has the ability to push config updates. If
your configuration is also represented in a database then you can use RBAC
in order to control who can update those roles and modify config. People
could be writing dashboards that write to dynamic databags in order to
change server access control (add/remove users), and to affect workflow
(approve deployment to prod) and other ways to hook in 'IT Governance'
dynamically into Chef, beyond having the people with commit access to the
repo being the gateway to all configuration change.

I don't much care for mantras. I don't think adopting a database-model is
in conflict with the intent of that mantra, since the data in the database
is being translated into configuration change on the servers by code. So
its still code doing the work, we're not logging into the servers and
making with the typey-typey. But, if its in conflict then someone needs to
explain how the hardcore-SCM-centric worldview produces utility and gets
the job done better. I tend to see it as not scaling out well beyond this
developer-centric IT model. Databases and SCM are tools, but they have
different purposes.

On 8/13/13 5:28 PM, Benjamin Bytheway wrote:

I'm currently trying to untangle this question for my own organization. If
the mantra "configuration as code" is really true, shouldn't everything
(that isn't true dynamic/discoverable data, like ohai) be "code" and under
SCM?

On Tue, Aug 13, 2013 at 6:10 PM, Lamont Granquist lamont@opscode.comwrote:

On 8/13/13 12:08 PM, Jeff Blaine wrote:

On 8/13/2013 2:51 PM, Ranjib Dey wrote:

u mean the node object?

Yes. We're a "lab" type of environment with a wide array of node types
and configurations. It is a rarity for any of our servers to have the exact
same configuration as another. Fun...

I'm having uneasy feelings lately about everything except our node
code/data being in files under SCM. It's a break from convention, an
unnecessary(???) anomaly for other less experienced staff to remember, etc.

I guess one could download all of the current nodes' JSON data into
files, check them all in, and somehow disable 'knife node edit' in favor of
'knife node from file'?

Anyone doing this?

That data isn't really code, its a row in a database. I can see the
value of periodically dumping the node data to SCM and storing it as
history, but people don't normally try to manage database tables with SCM.
If you start trying to limit what can write to nodes and data bags you
start to unnecessarily cripple your ability to use the chef database as a
dynamic CMDB. And IMO there is some unmined potential in treating databags
and nodes as simple database tables and pushing data in them from other
sources in the enterprise, and to look beyond only pushing to them from
SCM. You can try to push the SCM tooling as far as possible, and maybe I'm
wrong about this and if we only took SCM /really/ seriously we'd discover
something sublime about how to manage servers (e.g. being able to git push
directly to a chef server), but I come from having managed servers using
classical CMDBs and found them very powerful and trying to manage CMDBs
entirely with SCM seems unnecessarily crippling to me... You don't
typically dump your LDAP or AD databases into SCM. Nobody dumps their
customer databases into SCM. In the networking world rancid pushes Cisco
configs into SCM, but its to maintain a history and log of changes, not to
manage networking gear with SCM.

Thanks for the helpful replies. We're getting into areas here that were
not touched on in the previous thread, so I'm glad I asked after all
(after it was discussed 2 days prior).

Who should be making the configuration-data changes to nodes if not devs
and ops (and security in coordination with devs and ops)? What
situations exist where the people who don't already have git push and
knife upload privileges would need to be given
out-of-change-management-control access to site-local node configuration
data (which is just as critical to stability (et al) as cookbook code)?
Clue me in?

It's not important to know that samsmith changed node['blah'] from X to
Y at 7:34PM yesterday because "Explanation A B C"? How is that not
absolutely necessary? Or you're (Lamont) saying that this should be
gleaned back-channel via the database transaction log coupled with a
direct question to the database user who performed the transaction? I'm
struggling to resolve this idea you're posing that change control is
irrelevant WRT server configuration values.

I'm eager to learn.

[Lamont]:

I find that limiting because then you have to be a dev in order to
push SCM updates

"... given the current interface."

"Don't use SCM, even if it really should be used, because the UI
provided so far is hairy for non-techies." doesn't hold water to me (as
a gut reaction). I realize I am quoting words you didn't say, but that's
what it seems you are saying.

[Benjamin]:

All have their strengths and drawbacks. We've had a hard time seeing
a way forward. One issue is that we have a huge number of apps that
currently run on a much smaller number of servers, with many apps
stacked deep on each cluster. This makes nearly every server
different in fundamental ways. There is much that is common as far as
the larger components (weblogic, apache, jvm), but after that the
variation is enormous.

Seems like there are as many ways to do this as there are
organizations using chef.

We're in that same boat, but likely to an even more disjointed level.

On 8/14/13 7:09 AM, Jeff Blaine wrote:

Thanks for the helpful replies. We're getting into areas here that
were not touched on in the previous thread, so I'm glad I asked after
all (after it was discussed 2 days prior).

Who should be making the configuration-data changes to nodes if not
devs and ops (and security in coordination with devs and ops)? What
situations exist where the people who don't already have git push and
knife upload privileges would need to be given
out-of-change-management-control access to site-local node
configuration data (which is just as critical to stability (et al) as
cookbook code)? Clue me in?

It's not important to know that samsmith changed node['blah'] from X
to Y at 7:34PM yesterday because "Explanation A B C"? How is that not
absolutely necessary? Or you're (Lamont) saying that this should be
gleaned back-channel via the database transaction log coupled with a
direct question to the database user who performed the transaction?
I'm struggling to resolve this idea you're posing that change control
is irrelevant WRT server configuration values.

I'm eager to learn.
Yeah, what I'm thinking of is scaling to access control out to tens of
thousands of computers, thousands of users and hundreds of different
teams. A solution that I've seen at that scale was to create a
self-service console where managers of different teams were able to hit
a webui in order to handle the access control for their direct reports.
They were given control over some collection of server roles in the
organization and the pain of new hires and terminations was distributed
over the entire organization. This was complicated by the needs of
centralized operations, networking and security and there were
'horizontal' account management that cut across all the verticals so it
wasn't entirely that simple.

Giving out git commit access to the central config management repo that
allows you to do the moral equivalent of knife ssh ':' 'rm -rf /'
everywhere is horrible. You'd have a hundred people who had complete
access to all your servers who only needed that for a small slice of
their job.

And clearly I don't mean to argue that you don't need history, but that
you can use the database for that (and we're working on something very
much like that for hosted/private chef which uses the postgres database
for change history, not git).

"... given the current interface."

"Don't use SCM, even if it really should be used, because the UI
provided so far is hairy for non-techies." doesn't hold water to me
(as a gut reaction). I realize I am quoting words you didn't say, but
that's what it seems you are saying.

Not really what I was saying. The emphasis should have been more on the
lack of decent RBAC access controls to individual elements inside of SCM
repos. Last time I looked at git it was all-or-nothing access controls
over entire repos. You could wrap git with a webui which had RBAC
control over individual entities in git and then use that to push to the
database, but I'm not seeing the huge advantage there over just
embracing that you've just got a database, so throw RBAC tools at the
database rather than around git.

You can do that instead with data bags -- have a webui with an RBAC
system backed by data bag entries that chef can read, but you can
implement it entirely outside of your SCM system. As data bags are
updated one of the things it should do is write a data bag stream for
every change that is made so that you've still got your auditable
history that you can show your SOX auditors. You can try to do that
using git entirely as the intermediary, but you need to think about
merge conflicts and how your webui will deal with that -- and just
having multiple instances of a webui needing to do reads and writes to
the same data bags in git is probably going to be a synchronization
headache. Clearly, github does stuff like it, but its nowhere near as
common of a design pattern as a database (and I wouldn't be too
surprised if you peel github apart if they don't have databases to help
synchronize their git access).

What we do is a separate chef server per team/org with entirely separate
chef repositories. We distribute cookbooks to the whole company and people
have to upgrade the common cookbooks every now and then (using librarian
internally)

It would be nice to have the "organization" thing available in open source
chef so that we don't have a chef server for each team. We try to maintain
the number of teams as low as possible because of that.

However within one team the nodes are managed within git (note: it's bare
metal we don't have any VM or dynamic node creation) and literally every
operation on prod goes through git and then automated release management.
This forces people to automate, document and log anything happening on
prod. I really think this is the right way. Operations should be rare and
release managed as much as possible. Dynamic scaling with VMs could also
work but then you have something else automated that manage the nodes for
you, not a human.
On Aug 14, 2013 9:51 PM, "Lamont Granquist" lamont@opscode.com wrote:

On 8/14/13 7:09 AM, Jeff Blaine wrote:

Thanks for the helpful replies. We're getting into areas here that were
not touched on in the previous thread, so I'm glad I asked after all (after
it was discussed 2 days prior).

Who should be making the configuration-data changes to nodes if not devs
and ops (and security in coordination with devs and ops)? What situations
exist where the people who don't already have git push and knife upload
privileges would need to be given out-of-change-management-**control
access to site-local node configuration data (which is just as critical to
stability (et al) as cookbook code)? Clue me in?

It's not important to know that samsmith changed node['blah'] from X to Y
at 7:34PM yesterday because "Explanation A B C"? How is that not absolutely
necessary? Or you're (Lamont) saying that this should be gleaned
back-channel via the database transaction log coupled with a direct
question to the database user who performed the transaction? I'm struggling
to resolve this idea you're posing that change control is irrelevant WRT
server configuration values.

I'm eager to learn.

Yeah, what I'm thinking of is scaling to access control out to tens of
thousands of computers, thousands of users and hundreds of different teams.
A solution that I've seen at that scale was to create a self-service
console where managers of different teams were able to hit a webui in order
to handle the access control for their direct reports. They were given
control over some collection of server roles in the organization and the
pain of new hires and terminations was distributed over the entire
organization. This was complicated by the needs of centralized operations,
networking and security and there were 'horizontal' account management that
cut across all the verticals so it wasn't entirely that simple.

Giving out git commit access to the central config management repo that
allows you to do the moral equivalent of knife ssh ':' 'rm -rf /'
everywhere is horrible. You'd have a hundred people who had complete
access to all your servers who only needed that for a small slice of their
job.

And clearly I don't mean to argue that you don't need history, but that
you can use the database for that (and we're working on something very much
like that for hosted/private chef which uses the postgres database for
change history, not git).

"... given the current interface."

"Don't use SCM, even if it really should be used, because the UI provided
so far is hairy for non-techies." doesn't hold water to me (as a gut
reaction). I realize I am quoting words you didn't say, but that's what it
seems you are saying.

Not really what I was saying. The emphasis should have been more on the
lack of decent RBAC access controls to individual elements inside of SCM
repos. Last time I looked at git it was all-or-nothing access controls
over entire repos. You could wrap git with a webui which had RBAC control
over individual entities in git and then use that to push to the database,
but I'm not seeing the huge advantage there over just embracing that you've
just got a database, so throw RBAC tools at the database rather than around
git.

You can do that instead with data bags -- have a webui with an RBAC system
backed by data bag entries that chef can read, but you can implement it
entirely outside of your SCM system. As data bags are updated one of the
things it should do is write a data bag stream for every change that is
made so that you've still got your auditable history that you can show your
SOX auditors. You can try to do that using git entirely as the
intermediary, but you need to think about merge conflicts and how your
webui will deal with that -- and just having multiple instances of a webui
needing to do reads and writes to the same data bags in git is probably
going to be a synchronization headache. Clearly, github does stuff like
it, but its nowhere near as common of a design pattern as a database (and I
wouldn't be too surprised if you peel github apart if they don't have
databases to help synchronize their git access).

"Rare and release managed as much as possible" is the opposite of agile,
and in pretty strict opposition to ideas like continuous delivery as
well, unless I'm misunderstanding you. It also just doesn't scale as
the business grows. Eventually centralized beaurocratic operations is
simply overwhelmed and doesn't serve the business. At that point I
prefer lightweight change that happens often, through well engineered
channels and is distributed throughout the Enterprise. If all we've
done with devops is move it from "SAs" logging into boxes using their
godlike powers to type 'adduser' to "devops" making changes in git and
typing 'git push', I think you've just shuffled the responsibilities
around without really breaking down the walls to the rest of the company.

I'm also very skeptical that synchronizing across orgs will scale. As
someone who dealt with horizontal SOX and PCI-DSS configuration
responsibilities in an enterprise with something like 6,000 different
individual roles and hundreds of business units, the idea of having to
manage compliance across 100 tenants where its designed around those
being compartmentalized and partitioned, instead of sharing
common state, is a little horrifying. You lose the ability there to
have a single base role that contains recipes that are pushed to every
single server in your platform, which is very powerful when it comes to
compliance. When you are small and you have some committed team members
then you can do a good job at synchronizing across orgs, but when you
hit more orgs you'll suffer from rot and you'll wildly varying degrees
of compliance across your orgs.

You could address that by trying to engineer better synchronization
primitives around orgs, but again the design goal there is for hard
partitioning between orgs in hosted chef, so the organization concept
starts with being fundamentally hostile to trying to do that.

On 8/14/13 2:46 PM, Maxime Brugidou wrote:

What we do is a separate chef server per team/org with entirely
separate chef repositories. We distribute cookbooks to the whole
company and people have to upgrade the common cookbooks every now and
then (using librarian internally)

It would be nice to have the "organization" thing available in open
source chef so that we don't have a chef server for each team. We try
to maintain the number of teams as low as possible because of that.

However within one team the nodes are managed within git (note: it's
bare metal we don't have any VM or dynamic node creation) and
literally every operation on prod goes through git and then automated
release management. This forces people to automate, document and log
anything happening on prod. I really think this is the right way.
Operations should be rare and release managed as much as possible.
Dynamic scaling with VMs could also work but then you have something
else automated that manage the nodes for you, not a human.

What I call release management is actually automated and consist of simply
git push triggering unit test builds, preprod deployment and later prod. It
can effectively be done multiple times per hour/day and is independent
between teams (since there are separate chef repositories) so I consider
the process agile. It just has to be done by someone who knows git and
chef. We don't want a UI that someone else could use without SCM.

The synchronization between teams/orgs is actually an interesting topic.
The team/org is fully responsible for their chef repo and the cookbooks so
we don't want another compliance team to push things on servers without
their knowledge or consent. However we can centralize the deployment and
add compliance checks there. This is actually very flexible. It can
guarantee that people upgrade the compliance cookbook.
On Aug 15, 2013 1:29 AM, "Lamont Granquist" lamont@opscode.com wrote:

"Rare and release managed as much as possible" is the opposite of agile,
and in pretty strict opposition to ideas like continuous delivery as well,
unless I'm misunderstanding you. It also just doesn't scale as the
business grows. Eventually centralized beaurocratic operations is simply
overwhelmed and doesn't serve the business. At that point I prefer
lightweight change that happens often, through well engineered channels and
is distributed throughout the Enterprise. If all we've done with devops is
move it from "SAs" logging into boxes using their godlike powers to type
'adduser' to "devops" making changes in git and typing 'git push', I think
you've just shuffled the responsibilities around without really breaking
down the walls to the rest of the company.

I'm also very skeptical that synchronizing across orgs will scale. As
someone who dealt with horizontal SOX and PCI-DSS configuration
responsibilities in an enterprise with something like 6,000 different
individual roles and hundreds of business units, the idea of having to
manage compliance across 100 tenants where its designed around those being
compartmentalized and partitioned, instead of sharing
common state, is a little horrifying. You lose the ability there to have
a single base role that contains recipes that are pushed to every single
server in your platform, which is very powerful when it comes to
compliance. When you are small and you have some committed team members
then you can do a good job at synchronizing across orgs, but when you hit
more orgs you'll suffer from rot and you'll wildly varying degrees of
compliance across your orgs.

You could address that by trying to engineer better synchronization
primitives around orgs, but again the design goal there is for hard
partitioning between orgs in hosted chef, so the organization concept
starts with being fundamentally hostile to trying to do that.

On 8/14/13 2:46 PM, Maxime Brugidou wrote:

What we do is a separate chef server per team/org with entirely separate
chef repositories. We distribute cookbooks to the whole company and people
have to upgrade the common cookbooks every now and then (using librarian
internally)

It would be nice to have the "organization" thing available in open source
chef so that we don't have a chef server for each team. We try to maintain
the number of teams as low as possible because of that.

However within one team the nodes are managed within git (note: it's bare
metal we don't have any VM or dynamic node creation) and literally every
operation on prod goes through git and then automated release management.
This forces people to automate, document and log anything happening on
prod. I really think this is the right way. Operations should be rare and
release managed as much as possible. Dynamic scaling with VMs could also
work but then you have something else automated that manage the nodes for
you, not a human.