Centralized cookbook-library repos vs distributed cookbook repos

Hi ,

Over the last few days I have been lightly using some Chef cookbooks.
One issue I have run into, even at this early stage, is tracking what
is going on among the various github ‘cookbook library’ repositories.
While it is early days, these cookbook-library repo’s are already
impressive. The large fork count actually seems to be the problem. I
find it next to impossible to see who has made what changes to
cookbook Y.
I also find it nigh impossible to see just how some of the cookbooks
differ between libraries.

I am wondering why git submodules are not used by the community?

The Chef user base is large enough that there would seem to be enough
benefits to justify the additional git commands. Besides, rake tasks
could take some of the pain out of tracking, updating and contrbuting
back.

If there was a cookbook account on github with multiple collaborators,
each application/service cookbook could have its own project under
this account.
It would also be possible for more people to volunteer to maintain a
’reference’ cookbook for their favourite application, using their own
Github account, and have this as a submodule of a community
cookbook-library.
Everyone else could track these individual cookbook projects as a
submodule of their own ‘cookbook-library’ project.

To my mind the benefits to each cookbook becoming a first class Git
project would be:

  • More applications have one reference cookbook with several
    site-cookbooks to accomdate/illustrate different
    approaches/requirements.
  • Easily track what people have forked over in a specific cookbook.
  • Users can pick and choose specific cookbooks rather than taking all
    of several monolithic library repo’s of 80+ cookbooks.
  • Whenever required each cookbook can be tagged with a Chef/Ruby/OS
    version and submodules then point to just that tag.
  • Settling on a common naming prefix, say ‘cc-’ (Chef Cookbook), we
    could readily search google and github for these types of projects.

I’m not a Git or Chef guru so I am wondering if I have overlooked somthing?

Regards


πόλλ’ οἶδ ἀλώπηξ, ἀλλ’ ἐχῖνος ἓν μέγα
[The fox knows many things, but the hedgehog knows one big thing.]
Archilochus, Greek poet (c. 680 BC – c. 645 BC)
http://wiki.hedgehogshiatus.com

On Fri, Apr 9, 2010 at 12:30 AM, Hedge Hog hedgehogshiatus@gmail.com wrote:

I'm not a Git or Chef guru so I am wondering if I have overlooked somthing?

Have you seen this http://cookbooks.opscode.com/ ?

It's still "alpha" and has been for a while. Hopefully Opscode will
release a round of features soon.

It excites me: Opscode Cookbooks community announced | btm.geek

Bryan

Hedge Hog,

Cookbook repos are more like small applications rather than libraries. The dependencies the cookbooks have on each other, on operating system versions and the lack of inheritance support for cookbook resources make sharing hard. My recommendation is not to expect to use any cookbook as-is, but rather use the ones you see as inspiration to create your own, or adopt the repo that most mirrors your needs and adapt as necessary. Recipes tend to be small and easily digestible. While I applaud the effort to share cookbooks, I don’t see a big need for a blessed way to do it.

For example, our (37signals) repo changed a lot for Chef 0.8, and we are constantly updating them in a very opinionated way (for specific versions of software, for example). We don’t try to support anything but the current version of Ubuntu we’re using. We set defaults specific to our environment and sometimes skip steps that would make the repo easier to cargo cult. The Opscode repo, in contrast, tries to support many operating systems in its recipes.

I’m curious about others’ opinions here.

Joshua

On Fri, Apr 9, 2010 at 5:36 PM, Bryan McLellan btm@loftninjas.org wrote:

On Fri, Apr 9, 2010 at 12:30 AM, Hedge Hog hedgehogshiatus@gmail.com wrote:

I'm not a Git or Chef guru so I am wondering if I have overlooked somthing?

Have you seen this http://cookbooks.opscode.com/ ?

Thanks, I had. I think Git does offer something powerful, rapid bug
fix distribution is one.

Regards

It's still "alpha" and has been for a while. Hopefully Opscode will
release a round of features soon.

It excites me: Opscode Cookbooks community announced | btm.geek

Bryan

--
πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα
[The fox knows many things, but the hedgehog knows one big thing.]
Archilochus, Greek poet (c. 680 BC – c. 645 BC)
http://wiki.hedgehogshiatus.com

As Josh brings up, it takes a lot of work to make a cookbook abstract
enough that it could work for everyone. Usually when I write cookbooks
in-house, I end up somewhere in between something that other people
could use and something that is full of imperfections or hacks for my
infrastructure. Ultimately it's easier to make something that
functions enough to be a good example or start for others than a drop
in functioning cookbook.

On Fri, Apr 9, 2010 at 12:57 AM, Hedge Hog hedgehogshiatus@gmail.com wrote:

Thanks, I had. I think Git does offer something powerful, rapid bug
fix distribution is one.

There's no doubt that git has superb magic powers, and that leveraging
github makes bringing forks home and sharing code easier. There are
limitations to the accessibility to cookbooks via github that are
addressable in the Cookbooks site such as tags, notifications of
updates, and simpler integration between the chef-server and the
cookbook.

In either case, the missing key is people or organizations to take
ownership of cookbooks and file tickets to get them included in the
opscode github cookbooks repository or uploading them to the cookbooks
site. There's some big shoes to be filled there by someone
enterprising enough to lead.

Bryan

On Fri, Apr 9, 2010 at 5:47 PM, Joshua Sierles joshua@37signals.com wrote:

Hedge Hog,

Cookbook repos are more like small applications rather than libraries.

OK, then I have misunderstood. I thought the metaphor was that a
cookbook conatins several recipes, e.g the various Apache recipe's in
the Apache cookbook.

The dependencies the cookbooks have on each other, on operating system versions and the lack of inheritance support for cookbook resources make sharing hard.

Right, so the current approach implies this advice: "Don't
pick-and-choose cookbooks from different libraries", or at least, "do
so at your own risk".
Doesn't this encourage building visible silos rather than open repositories?

My recommendation is not to expect to use any cookbook as-is,

I had thouht the objective was for the community to work towards
eactly that, using them as is for common use cases, while accumulating
common knowledge in specific recipes that address tricky edge cases.

but rather use the ones you see as inspiration to create your own, or adopt the repo that most mirrors your needs and adapt as necessary. Recipes tend to be small and easily digestible. While I applaud the effort to share cookbooks, I don't see a big need for a blessed way to do it.

I'm not suggesting anything be 'blessed', just a way of sharing that
is more transparent about what people are forking over, and is more
flexible in deploying. I also not suggesting you won't need to tweak,
just the opposite. If many people are making the same tweaks, and we
can easily see them, perhaps it is worth revisiting somthing. Again
I'm not saying people won't have esoteric needs, rather that others
might learn from those things they do.

For example, our (37signals) repo changed a lot for Chef 0.8,

Precisely. Since you are probably not Robinson Crusoe on this,
wouldn't it be useful to have a 'chef-0.7', etc. tag on each cookbook
before the changes, and for example a 'chef-0.8.12' tag on those
cookbook that work under 0.8.12?

and we are constantly updating them in a very opinionated way (for specific versions of software, for example).

Great. I'm not sure I see why you can't have, for example:
apache2 / templates / 37s / security.erb
apache2 / recipes / 37s / security.erb

My understanding is the 37s recipe subfolder will currently be ignored
by Chef.
When or if that changes a tag will allow people to make changes
without breaking cookbook libraries whose submodules point to the 0.8
versions.

We don't try to support anything but the current version of Ubuntu we're using.

Perfect. Today's current is tomorrow's legacy. Tagging these change
points will still allow other people to track and update the 'old'
Ubuntu versions after you have moved on. Remember even Linux makes
incremental changes, so there is very likely something to be learned
from what people uncover during the whole lifespan of a LTS release.
To keep things sane it may be sensible for the community to settle on
a tagging convention.

We set defaults specific to our environment and sometimes skip steps that would make the repo easier to cargo cult. The Opscode repo, in contrast, tries to support many operating systems in its recipes.

Again, great. I'm not sure why tagging wouldn't allow you to do that?
You never know, some of your custom configs may become the default :slight_smile:

Regards

I'm curious about others' opinions here.

Joshua

--
πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα
[The fox knows many things, but the hedgehog knows one big thing.]
Archilochus, Greek poet (c. 680 BC – c. 645 BC)
http://wiki.hedgehogshiatus.com

On Fri, Apr 9, 2010 at 6:36 PM, Bryan McLellan btm@loftninjas.org wrote:

As Josh brings up, it takes a lot of work to make a cookbook abstract
enough that it could work for everyone. Usually when I write cookbooks
in-house, I end up somewhere in between something that other people
could use and something that is full of imperfections or hacks for my
infrastructure. Ultimately it's easier to make something that
functions enough to be a good example or start for others than a drop
in functioning cookbook.

OK I seem to have had the idea of several recipes in one cookbook
while perhaps the typical approach is one recipe per cookbook.

On Fri, Apr 9, 2010 at 12:57 AM, Hedge Hog hedgehogshiatus@gmail.com wrote:

Thanks, I had. I think Git does offer something powerful, rapid bug
fix distribution is one.

There's no doubt that git has superb magic powers, and that leveraging
github makes bringing forks home and sharing code easier. There are
limitations to the accessibility to cookbooks via github that are
addressable in the Cookbooks site such as tags, notifications of
updates, and simpler integration between the chef-server and the
cookbook.

In either case, the missing key is people or organizations to take
ownership of cookbooks and file tickets to get them included in the
opscode github cookbooks repository or uploading them to the cookbooks
site. There's some big shoes to be filled there by someone
enterprising enough to lead.

Or lots of people with small shoes, sharing in a common style :slight_smile:

Bryan

--
πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα
[The fox knows many things, but the hedgehog knows one big thing.]
Archilochus, Greek poet (c. 680 BC – c. 645 BC)
http://wiki.hedgehogshiatus.com

On Fri, Apr 9, 2010 at 1:52 AM, Hedge Hog hedgehogshiatus@gmail.com wrote:

OK I seem to have had the idea of several recipes in one cookbook
while perhaps the typical approach is one recipe per cookbook.

I think multiple recipes is the way to go, but it is dependent on the
cookbook. It's pretty much like programming in any language, separate
the code out by function and avoid repetition. The apache cookbook has
function for the common code associated with a module, but then
individual recipes for each module as an apache module may have other
package dependencies or unique actions like modifying a configuration
file somewhere. Anything that is both a server and a client is going
to have separate recipes for both. If the server should also be a
client, such as a munin server, one can simply include the client
recipe from the server recipe. However these should not be separate
cookbooks because they will share a common attribute space, common
packages, and possibly maintain similar files.

As I mentioned, when first writing a cookbook it's often pretty dirty
and looks more like a top down list of everything I did to get the
software working than anything else. As I work on it, I identify the
areas where the cookbook could be more flexible and modify it
accordingly.

Or lots of people with small shoes, sharing in a common style :slight_smile:

Yes, but we haven't made it over the hump yet where this could happen.
The magic moment is when the upstream cookbooks have enough features
and integration that it is less work to contribute back upstream than
it is to maintain your forks. Open source cookbooks are just like open
source software in that way. As long as upstream cookbooks are only
getting a couple lines of change to fix minor bugs or make them
compatible with new Chef releases, I don't see it happening.

Bryan

On Fri, Apr 9, 2010 at 3:30 AM, Hedge Hog hedgehogshiatus@gmail.com wrote:

The large fork count actually seems to be the problem.

I don't think we will have "one cookbook to rule them all". For
example, take the Apache cookbook. The Opscode cookbook recipe takes
the "kitchen sink" approach, and tries to insulate the recipe user
from making any changes to the apache config. There are over 70 files
in the recipe, and it can take a day just to read and understand all
that code and all the 'tweakable' settings. But if they missed a
setting, have to decide "at which layer (chef or apache) do I fix the
problem?"

I took the opposite approach: I was fine with requiring the recipe
writer to make changes, so I started with my existing Apache config,
and added a little chef magic specific to our organisation. The
ability to change the Apache port or tune the number of workers
(without editing the config) just wasn't worth the added layers of
complexity (to me).

So, I don't think there can be one single cookbook recipe that will
work for a large hosting provider (generating 1000's of virtual hosts)
and a small websites (that just wants to set up few rails proxies).
One is likely to use looping within a single recipe to generate
vhosts, the other is likely to use a definition called from multiple
recipes to generate vhosts.

The bigger problem right now is that we've got 100's of example
cookbooks, but no explanation of WHY the people did what they did.

-=Dan=-

Hi,
Yeah the right cookbook structure for the complex ones... I am still
learning so take with a pinch of salt? Beyond a certain level of
complexity it makes sense to hide them behind a custom domain-specific
resource. Then its good to try to encapsulate / hide the more complex
sequences within /libraries/ LWRP, Definition or custom
resource+provider. Then make it clear how to use those special
resouces.

Split into multiple recipes files. (sorry about the line wrapping here)
eg;
recipes/base.rb <---- hopefully, users dont
have to touch this
recipes/user_example.rb <---- show users how to use your
custom resource
recipies/user_customized1.rb <---- user can add their own file

Good to include an example json file (easily runnable by chef solo) in
files/default/ folder. To demonstrate how to overload your json
attributes, and let people have a way to quickly test it out the whole
recipe.

As for documentation, its not good to leave the readme.rdoc file
empty. Its free-form so i guess maybe that puts some people off when
they dont know what content to put in it?

Maybe if the documentation is just extra metadata within the source
code. So what about a yard plugin for chef? The author of yard has
provided some interesting examples including one that allows yard to
understand rspec tests. Chef resources (and other chef objects) may be
documentable in this way too. If thats a sensible goal to aim for.
However it would take a general consensus amongst contributors to
actually document their cookbooks.

Not sure everyone is onboard with that idea yet :frowning:

On Fri, Apr 9, 2010 at 2:59 PM, Dan DeMaggio dan@animoto.com wrote:

On Fri, Apr 9, 2010 at 3:30 AM, Hedge Hog hedgehogshiatus@gmail.com wrote:

The large fork count actually seems to be the problem.

I don't think we will have "one cookbook to rule them all". For
example, take the Apache cookbook. The Opscode cookbook recipe takes
the "kitchen sink" approach, and tries to insulate the recipe user
from making any changes to the apache config. There are over 70 files
in the recipe, and it can take a day just to read and understand all
that code and all the 'tweakable' settings. But if they missed a
setting, have to decide "at which layer (chef or apache) do I fix the
problem?"

I took the opposite approach: I was fine with requiring the recipe
writer to make changes, so I started with my existing Apache config,
and added a little chef magic specific to our organisation. The
ability to change the Apache port or tune the number of workers
(without editing the config) just wasn't worth the added layers of
complexity (to me).

So, I don't think there can be one single cookbook recipe that will
work for a large hosting provider (generating 1000's of virtual hosts)
and a small websites (that just wants to set up few rails proxies).
One is likely to use looping within a single recipe to generate
vhosts, the other is likely to use a definition called from multiple
recipes to generate vhosts.

The bigger problem right now is that we've got 100's of example
cookbooks, but no explanation of WHY the people did what they did.

-=Dan=-

On Fri, Apr 9, 2010 at 12:03 PM, dreamcat four dreamcat4@gmail.com wrote:

Maybe if the documentation is just extra metadata within the source
code. So what about a yard plugin for chef?

I agree. In a perfect world:

  • the metadata should be generated from inline comments. Attribute
    metadata would be declared in the attribute/ files, and Recipe
    metadata (parameters) would be declared in the recipe. (Ditto for
    macros, etc). This would make it much harder for the metadata to drift
    out of sync.
  • the metadata extractor should be part of chef (not in a rake file
    that needs to be copied into every project!)
  • the chef server should just scan the recipes for metadata. (not run
    them, but just parse the comments). So we wouldn't even need
    metadata(.json|.rb) at all. That also solves the "why do I need to
    check in a generated file?" problem too.

Currently, I don't use metadata or rake files in my cookbooks.
(Granted, I'm using chef-solo.)

-=Dan=-

On Fri, Apr 9, 2010 at 11:59 PM, Dan DeMaggio dan@animoto.com wrote:

On Fri, Apr 9, 2010 at 3:30 AM, Hedge Hog hedgehogshiatus@gmail.com wrote:

The large fork count actually seems to be the problem.

I don't think we will have "one cookbook to rule them all". For

Choosing to use a cookbook as a common base does not mean you are
being ruled by anyone - no matter how many rings they say they have.

You just get to see different recipes people are using in the one cookbook.
Rather than chasing down N cookbook libraries.
Granted it might be necessary to settle on a deeper folder structure.

example, take the Apache cookbook. The Opscode cookbook recipe takes
the "kitchen sink" approach, and tries to insulate the recipe user
from making any changes to the apache config. There are over 70 files
in the recipe, and it can take a day just to read and understand all
that code

That is a straw man argument - it is a little like saying I won't use
a dictionary because it'll take me a month to read the whole thing.
Really, if recipes have sane file names you can actually make an
educated guess about which are relevant.

and all the 'tweakable' settings. But if they missed a
setting, have to decide "at which layer (chef or apache) do I fix the
problem?"

This is an argument for better documentation, which is valid
regardless of whether there is one shared cookbook or 12 shared
cookbooks.
For example, as Joshua Sierles points out 37signals takes a different approach.
That is useful information, and could be documented in a common
cookbook readme- (assuming the recipe was accepted. Which is an
important benefit of having cookbooks as repo's - recipes get to be
debated before they are merged/accepted. Having
maintainers/communities look after their favorite application
increases the likelihood that poor config's get ironed out rather than
propagated.

Anyway, I'm not sure we are likely to see better cookbook
documentation efforts because we have 8 apache cookbooks sitting in 8
different library silo's. Better to have one readme where there is a
chance for people to say 'Recipe X does Y because of A and assumes K'.

I took the opposite approach: I was fine with requiring the recipe
writer to make changes, so I started with my existing Apache config,
and added a little chef magic specific to our organisation. The
ability to change the Apache port or tune the number of workers
(without editing the config) just wasn't worth the added layers of
complexity (to me).

So, I don't think there can be one single cookbook recipe that will
work for a large hosting provider (generating 1000's of virtual hosts)

As I said before I was under the impression a cookbook could contain
multiple recipes.
So there could be some large_hosting_* recipes....

and a small websites (that just wants to set up few rails proxies).

moderate_hosting_* and small_hosting_* recipes (in files or folders).
From what has been said so far the intended use is not one
recipe-per-cookbook.

One is likely to use looping within a single recipe to generate
vhosts, the other is likely to use a definition called from multiple
recipes to generate vhosts.

The bigger problem right now is that we've got 100's of example
cookbooks, but no explanation of WHY the people did what they did.

Without encouraging people to pool/collate that documentation you'd
then have 100 explanations across 100 silos.
Better to have one readme with an overview, and detail in the other
file's documentation.
It seems another argument for organizing cookbooks as first class git
repos is that as the maintainers pool/merge it is likely that
documentation will be pooled.

Regards

-=Dan=-

Aplologies for the top post - I should have said this at the outset.
It is worth stating explicitly:
Having cookbooks as first class repos makes it easier to maintain,
merge and distribute them.

It is not axiomatic that there will be one 'reference' cookbook. That
is up to the individuals involved.

For some applications there may only be one cookbook in existence.
For other applications 20 cookbooks could coalesce to 3, or 1, or 19.
It will be up to you.
Where recipes are merged into a cookbook it will possible to describe
the 'what-and-why' in one place.
It will be easier to observe how many people are following (github
specific) and contributing to each of the cookbooks.

Regards

On Fri, Apr 9, 2010 at 5:30 PM, Hedge Hog hedgehogshiatus@gmail.com wrote:

Hi ,

Over the last few days I have been lightly using some Chef cookbooks.
One issue I have run into, even at this early stage, is tracking what
is going on among the various github 'cookbook library' repositories.
While it is early days, these cookbook-library repo's are already
impressive. The large fork count actually seems to be the problem. I
find it next to impossible to see who has made what changes to
cookbook Y.
I also find it nigh impossible to see just how some of the cookbooks
differ between libraries.

I am wondering why git submodules are not used by the community?

The Chef user base is large enough that there would seem to be enough
benefits to justify the additional git commands. Besides, rake tasks
could take some of the pain out of tracking, updating and contrbuting
back.

If there was a cookbook account on github with multiple collaborators,
each application/service cookbook could have its own project under
this account.
It would also be possible for more people to volunteer to maintain a
'reference' cookbook for their favourite application, using their own
Github account, and have this as a submodule of a community
cookbook-library.
Everyone else could track these individual cookbook projects as a
submodule of their own 'cookbook-library' project.

To my mind the benefits to each cookbook becoming a first class Git
project would be:

  • More applications have one reference cookbook with several
    site-cookbooks to accomdate/illustrate different
    approaches/requirements.
  • Easily track what people have forked over in a specific cookbook.
  • Users can pick and choose specific cookbooks rather than taking all
    of several monolithic library repo's of 80+ cookbooks.
  • Whenever required each cookbook can be tagged with a Chef/Ruby/OS
    version and submodules then point to just that tag.
  • Settling on a common naming prefix, say 'cc-' (Chef Cookbook), we
    could readily search google and github for these types of projects.

I'm not a Git or Chef guru so I am wondering if I have overlooked somthing?

Regards

--
πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα
[The fox knows many things, but the hedgehog knows one big thing.]
Archilochus, Greek poet (c. 680 BC – c. 645 BC)
http://wiki.hedgehogshiatus.com

The essential idea proposed in this thread seems to be “Instead of one
repo per organization” (like 37 signals or opscode), let’s have “one
repo per application” (like apache or MongoDB). Each repo will be a
single cookbook with multiple recipes, but all focused on a single
application.

Pro:

  • allows bugfixes for an application to be shared easily (can’t in the
    current setup)
  • don’t have to wade thru recipes of apps you never plan to use
    (required in current setup)
  • recipe cherry-picking (I want the 37 signals version of apache, the
    opscode version of syslog, …) will be easier
  • different approaches can be together (in different recipes but same
    repo), which will make it easier to document the various approaches

Con:

  • an organisation might have big dependencies between it’s recipes.
    For example: Apache might be run under runnit and/or require a
    collectd module. It’s not clear how to handle these inter-app
    dependencies or where they should live.
  • users will still have to wade thru un-related stuff (i.e. big vs
    small sites, or the 100’s of "collectd module for monitoring X"
    recipes.)
  • more repos means more stuff to manage (there have been many
    solutions proposed to the problems of submodules, but nothing has
    emerged as ‘best practice’.)
  • different (incompatible) recipes will need different attributes.
    It’s not clear how to manage that. Should it all be in the metadata,
    even though only a subset will actually be used?

I think the idea has merit. The big names (opscode, ey, 37s) are
probably essential to getting this off the ground because people are
forking them the most currently.

-=Dan=-

Thanks wading through all the arguments and, to my mind, accurately
summarizing them...

On Sun, Apr 11, 2010 at 9:29 AM, Dan DeMaggio dan@animoto.com wrote:

The essential idea proposed in this thread seems to be "Instead of one
repo per organization" (like 37 signals or opscode), let's have "one
repo per application" (like apache or MongoDB). Each repo will be a
single cookbook with multiple recipes, but all focused on a single
application.

Pro:

  • allows bugfixes for an application to be shared easily (can't in the
    current setup)
  • don't have to wade thru recipes of apps you never plan to use
    (required in current setup)
  • recipe cherry-picking (I want the 37 signals version of apache, the
    opscode version of syslog, ...) will be easier
  • different approaches can be together (in different recipes but same
    repo), which will make it easier to document the various approaches

Con:

  • an organisation might have big dependencies between it's recipes.
    For example: Apache might be run under runnit and/or require a
    collectd module. It's not clear how to handle these inter-app
    dependencies or where they should live.

Maybe if Chef guru could remark on whether Chef's 'Cookbook Meta-data' [1], [2].
could help address/work-around this issue?

[1] http://wiki.opscode.com/display/chef/Cookbook+Metadata
[2] http://tickets.opscode.com/browse/CHEF-275

Regards

  • users will still have to wade thru un-related stuff (i.e. big vs
    small sites, or the 100's of "collectd module for monitoring X"
    recipes.)
  • more repos means more stuff to manage (there have been many
    solutions proposed to the problems of submodules, but nothing has
    emerged as 'best practice'.)
  • different (incompatible) recipes will need different attributes.
    It's not clear how to manage that. Should it all be in the metadata,
    even though only a subset will actually be used?

I think the idea has merit. The big names (opscode, ey, 37s) are
probably essential to getting this off the ground because people are
forking them the most currently.

-=Dan=-

--
πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα
[The fox knows many things, but the hedgehog knows one big thing.]
Archilochus, Greek poet (c. 680 BC – c. 645 BC)
http://wiki.hedgehogshiatus.com

On Sun, Apr 11, 2010 at 1:18 AM, Hedge Hog hedgehogshiatus@gmail.com wrote:

collectd module. It's not clear how to handle these inter-app
dependencies or where they should live.

Hi,
I use git to make my own life with chef recipes relatively painless.
Typcially, there is one reasonable workflow which I repeatedly use:

  1. Fork opscode/cookbooks -> Your fork.

  2. Rename "your/cookbooks" -> "your/site-cookbooks". Now clear all the
    files, so that the site-cookbooks are empty.

  3. add the opscode and 37 signals cookbooks as remotes to your local
    repo. If you have multiple official or semi-official source repos,
    then it may be worth to write a script to help initialize your
    remotes. If you sit inside an organisation which has a site-wide
    site-cookbooks, then ass that too. We assume here that you are
    commiting to a local / individual repository.

  4. We also assume that you are developing solely in "site-cookbooks".
    And that underlying "cookbooks" is kept as "opscode/cookbooks" (master
    branch). So we are never are comiting directly to the "cookbooks"
    base. Only the "site-cookbooks".

  5. Checkout git subtrees with `git checkout / --
    <coobook_name>. They will land up in your local master / working tree.
    This way you can positively select those cookbooks you intend need to
    edit / manipulate. This can be done on an "as-needed" basis, at any
    time the editing of a cookbook is called for. (To only edit a
    cookbooks's attributes, we dont need thid. Just override the cookbook
    attrs with json fragment / couchdb).

  6. Finding the dependencies of the cookbooks you wish to work on. Run
    this command in you site-cookbooks repo: $ grep -R -1 depends */metadata.rb. This should provide of all of the cookbooks which your
    (individually checked-out) selected cookbooks depend upon. Of course,
    once you checkout those direct dependencies, your git tree will become
    more */metadata.rb to parse. Hmm.

^^ This is a part of the process that could benefit from some kind of
recursive automation. And admittedly we dont need to actually check
out these files to peek what they are in the git tree. (or even
locally). Git should be flexible enough to also offer other / better
ways to help list a cookbook's dependencies.

  1. Work on your cookbooks. When the time is right, then merge, and
    push back to the higher repos as required. One of your git remotes
    will also be your production / test servers. So it should be a
    faimiliar git process.

To me, this workflow is reasonable and workable. Currently I perform
all of these steps manually on the commandline. Perhaps a rubygem
could help to automate some of the steps, namely setting up the list
of remotes, tracking branch, and recurisively finding / checking out
all a cookbook's dependencies. Just to make it all a bit more
consistent, and fewer repetitive steps.

To find the cookbook dependencies automatically:

If we want to use the "depends(cb)" declarations from the metadata.rb
file, we would need to somehow ensure that the dependencies are
allways being properly listed in there. Currently chef does not work
that way. If there are missing depends(cb) dependencies, chef will
continue to run a cookbook, and not raise any error. Chef also has the
"include_recipe()" directive which is used inside the
"recipes/recipe.rb" files. So I guess we could alternatively use a
more complex search algorithm to dig those up directly. Or make the
"include_recipe(cb)" fail unless its also accompanied by a
"depends(cb)" in the metadata.rb. I dont know, perhaps the chef
library already has some better way to figure this out.

However in chef-world, chef isnt aware of multiple cookbooks
repositories in a way that lets 2 cookbooks with the same name
co-exist. In part this is a feature, as it allows you to override
arbitrary files of a cookbook in your other git repo. However its not
inconceivable to think that 37's cookbook called "apache" might be
utterly diverged from the generic opscode version. If both such
popular cookbooks repos (the 37 signals and opscode) are to be
included as (default) remotes / cookbook sources, then we might have a
problem. Wouldnt you kindda need to scope / namespace for the
situation where 2 different upstream cookbooks share the same name?
Such scoping may be gleaned from various information lying around in
the cookbooks eco system. I wouldnt know if it could be make into a
de-facto standard. Such that a commandline tool (a "repo-helper-gem")
could rely upon it.

Which precise mechanisms would people want to support and maintain as
a standard? How would you decide to sacrifice some of the existing
flexibility in order to get those new features? And of course, thats
assuming that enough people would want to manage their cookbooks with
that exact same git workflow. There are many other git workflows too.

dreamcat4
dreamcat4@gmail.com

On Sat, Apr 10, 2010 at 5:18 PM, Hedge Hog hedgehogshiatus@gmail.com wrote:

Thanks wading through all the arguments and, to my mind, accurately
summarizing them...

This thread is awesome, and it highlights some things that I think are
important.

The first is that we can build a better developer workflow for
collaborating on and re-using cookbooks. The conversations around
sub-modules, etc. all fit in this category - it's obvious and clear
that we can make it easier. This problem domain is about making the
collaboration process and development as friction-free as possible.

The second is that we have a publishing problem that we've taken a
stab at solving, but haven't integrated as deeply into the process as
we should. This is the side of the issue where the problem is "I want
to discover the right Apache cookbook for me", and it contains within
it things like documentation standards, cookbook downloading, and
putting the right hooks in to the first problem so that when you go
from passive consumer to active developer the road is easy.

I think it's important to separate the publishing part of the workflow
from the development side - this is the reason why we have package
management systems in the first place. :slight_smile:

Adam

--
Opscode, Inc.
Adam Jacob, CTO
T: (206) 508-7449 E: adam@opscode.com

On Tue, Apr 13, 2010 at 6:17 AM, Adam Jacob adam@opscode.com wrote:

On Sat, Apr 10, 2010 at 5:18 PM, Hedge Hog hedgehogshiatus@gmail.com wrote:

Thanks wading through all the arguments and, to my mind, accurately
summarizing them...

This thread is awesome, and it highlights some things that I think are
important.

The first is that we can build a better developer workflow for
collaborating on and re-using cookbooks. The conversations around
sub-modules, etc. all fit in this category - it's obvious and clear
that we can make it easier. This problem domain is about making the
collaboration process and development as friction-free as possible.

The second is that we have a publishing problem that we've taken a
stab at solving, but haven't integrated as deeply into the process as

To clarify:
By publishing you mean 'how Opscode (and others) publish base
cookbooks/cook-book-library (and their updates)', rather than 'how
chef users publish their (possibly-customized)
cookbooks/cookbook-library to others (and their chef-server)'?

we should. This is the side of the issue where the problem is "I want
to discover the right Apache cookbook for me", and it contains within
it things like documentation standards, cookbook downloading, and
putting the right hooks in to the first problem so that when you go
from passive consumer to active developer the road is easy.

I think it's important to separate the publishing part of the workflow
from the development side - this is the reason why we have package
management systems in the first place. :slight_smile:

OK this is where I get a little confused - I'm not sure what role
package management has in delopying Chef cookbooks/recipes?
This is also why I couldn't understand the appearance of the Opscode
zipped cookbooks. I'm not saying there is no value, just my current
mindset/imagined-use-case is obscuring it.

A user's Chef cookbook (and cookbook-library) workflow I had in mind:

  1. Add a cookbook as a submodule to my cookbook-library in my remote
    git repo (maybe via some cute rake or thor script)
  2. Pull, then edit cookbook recipes (and metadata) on a local git
    repo until tested satisfactorily.
  3. Push to cookbook updates the 'deploy' branch in my remote (github)
    cookbook-library repo (could be a public or private cookbook-library)
    4*) Post-receive hook for this deploy branch ensures the
    cookbook(s)/recipe(s) are deployed to the chef server (style example:
    [1], [2] )
  4. Chef server works its automagic with the Chef clients.

There might be Opscode/37Signals/RightScale cookbooks (assume
dependency issues are solveable) I have as submodules - I can update
tags I track and pull/push to those tags.

Are you suggesting to install those cookbooks in step 1) using my OS
package management system? Using it to resove cookbook dependencies?
If so, this implictly suggests the Chef metadata I pointed to is not
suitable/capable of addressing the cookbook dependencies issues raised
in this thread. Correct?

Regards

[1] Pushr - Automate Your Deployment Process with Push Notifications - restafari.org
[2] GitHub - karmi/pushr: Deploy Rails applications automatically by running Capistrano tasks with Git post-commit hooks

Note *: You could of course setup your own post-receive hook service
against your local git repo, and so cut out the remote repo/Github.
This would eliminate steps 1) and 3). Since Github has those
facilities available for free accounts, I guessed they are likely to
be used before people roll their own.

Adam

--
Opscode, Inc.
Adam Jacob, CTO
T: (206) 508-7449 E: adam@opscode.com

--
πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα
[The fox knows many things, but the hedgehog knows one big thing.]
Archilochus, Greek poet (c. 680 BC – c. 645 BC)
http://wiki.hedgehogshiatus.com

Rather than answer this in prose, I'm going to answer it in code and a
diagram, since it will all be way more clear. :slight_smile:

Gimme another day.

Adam

On Mon, Apr 12, 2010 at 5:30 PM, Hedge Hog hedgehogshiatus@gmail.com wrote:

On Tue, Apr 13, 2010 at 6:17 AM, Adam Jacob adam@opscode.com wrote:

On Sat, Apr 10, 2010 at 5:18 PM, Hedge Hog hedgehogshiatus@gmail.com wrote:

Thanks wading through all the arguments and, to my mind, accurately
summarizing them...

This thread is awesome, and it highlights some things that I think are
important.

The first is that we can build a better developer workflow for
collaborating on and re-using cookbooks. The conversations around
sub-modules, etc. all fit in this category - it's obvious and clear
that we can make it easier. This problem domain is about making the
collaboration process and development as friction-free as possible.

The second is that we have a publishing problem that we've taken a
stab at solving, but haven't integrated as deeply into the process as

To clarify:
By publishing you mean 'how Opscode (and others) publish base
cookbooks/cook-book-library (and their updates)', rather than 'how
chef users publish their (possibly-customized)
cookbooks/cookbook-library to others (and their chef-server)'?

we should. This is the side of the issue where the problem is "I want
to discover the right Apache cookbook for me", and it contains within
it things like documentation standards, cookbook downloading, and
putting the right hooks in to the first problem so that when you go
from passive consumer to active developer the road is easy.

I think it's important to separate the publishing part of the workflow
from the development side - this is the reason why we have package
management systems in the first place. :slight_smile:

OK this is where I get a little confused - I'm not sure what role
package management has in delopying Chef cookbooks/recipes?
This is also why I couldn't understand the appearance of the Opscode
zipped cookbooks. I'm not saying there is no value, just my current
mindset/imagined-use-case is obscuring it.

A user's Chef cookbook (and cookbook-library) workflow I had in mind:

  1. Add a cookbook as a submodule to my cookbook-library in my remote
    git repo (maybe via some cute rake or thor script)
  2. Pull, then edit cookbook recipes (and metadata) on a local git
    repo until tested satisfactorily.
  3. Push to cookbook updates the 'deploy' branch in my remote (github)
    cookbook-library repo (could be a public or private cookbook-library)
    4*) Post-receive hook for this deploy branch ensures the
    cookbook(s)/recipe(s) are deployed to the chef server (style example:
    [1], [2] )
  4. Chef server works its automagic with the Chef clients.

There might be Opscode/37Signals/RightScale cookbooks (assume
dependency issues are solveable) I have as submodules - I can update
tags I track and pull/push to those tags.

Are you suggesting to install those cookbooks in step 1) using my OS
package management system? Using it to resove cookbook dependencies?
If so, this implictly suggests the Chef metadata I pointed to is not
suitable/capable of addressing the cookbook dependencies issues raised
in this thread. Correct?

Regards

[1] Pushr - Automate Your Deployment Process with Push Notifications - restafari.org
[2] GitHub - karmi/pushr: Deploy Rails applications automatically by running Capistrano tasks with Git post-commit hooks

Note *: You could of course setup your own post-receive hook service
against your local git repo, and so cut out the remote repo/Github.
This would eliminate steps 1) and 3). Since Github has those
facilities available for free accounts, I guessed they are likely to
be used before people roll their own.

Adam

--
Opscode, Inc.
Adam Jacob, CTO
T: (206) 508-7449 E: adam@opscode.com

--
πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα
[The fox knows many things, but the hedgehog knows one big thing.]
Archilochus, Greek poet (c. 680 BC – c. 645 BC)
http://wiki.hedgehogshiatus.com

--
Opscode, Inc.
Adam Jacob, CTO
T: (206) 508-7449 E: adam@opscode.com

Okay, in my head, I see two different workflows here:

  1. The Consumer Workflow

This is the workflow where you find a cookbook that works for you, and
you want to use it in your infrastructure, and track it in your own
source control (probably a chef-repo.) You may then want to make
changes to the upstream cookbook, and you then want to be able to
easily see the diff between the changes you have made and any new
versions the upstream has published.

The important thing here is that you care about the cookbook having
it’s changes tracked in your local repository (since you use it to
build your infrastructure), but you don’t care so much about having
the entirety of the upstream revision history (indeed, you might not
be able to, in the cases where you want to use a different source
control system from the upstream.) So you want to easily track the
upstream, and apply your changes.

  1. The Developer/Collaborator Workflow

In this case, you want to collaborate with the upstream on development
of the cookbook directly - you want your changes to be situated in the
upstream. To me, this workflow is actually defined by whomever is
managing the upstream - for Opscode, it’s tickets and pull requests to
the opscode/chef repo, for 37signals it’s different, and for someone
else it may require using mercurial or svn.

I’ve been thinking about the consumer workflow for a while now, and I
spent some of today working on getting a good pass at getting it down
to essentially a single command. For the developer/collaborator
workflow, we may also want to make a best practice, but I’m less
concerned about it right now.

The pattern I’m following is called “vendor branching”, and it’s been
around forever. The basic method is that you are tracking an upstream
source release, and you want to keep a certain set of patches in sync,
and be able to move between local versions easily. It was pretty
popular in the CVS days, if any of you can get into the way-back
machine with me. Here is a good description of doing this with Git:

http://happygiraffe.net/blog/2008/02/07/vendor-branches-in-git/

I’ve adapted this pattern to be integrated with knife and the
cookbooks.opscode.com site. If you already have knife configured to
work with your local chef repo, from the current opscode/chef, you can
do:

$ knife cookbook site vendor apache2

And you’ll get the latest version of the apache2 cookbook in a local
vendor branch. You can then make changes directly to the cookbook in
your local repository - you don’t need to build a site-cookbook, or do
anything else. Just make changes like normal. When the upstream
releases a new version, just repeat the command:

$ knife cookbook site vendor apache2

And the new version will be downloaded, the vendor branch updated, and
then merged into your master branch. The resulting diff can be easily
viewed, and if there are any merge conflicts, you get a chance to
review them. If you don’t like the changes, you can just ‘git reset
–hard HEAD~1’.

You can always get a diff between your current cookbook and the upstream with:

[adam@latte]% git diff chef-vendor-apache2-0.10.1 HEAD
diff --git a/cookbooks/apache2/attributes/apache.rb
b/cookbooks/apache2/attributes/apache.rb
index c733d5e…d6a8f76 100644
— a/cookbooks/apache2/attributes/apache.rb
+++ b/cookbooks/apache2/attributes/apache.rb
@@ -75,4 +75,3 @@ set_unless[:apache][:worker][:minsparethreads] = 64
set_unless[:apache][:worker][:maxsparethreads] = 192
set_unless[:apache][:worker][:threadsperchild] = 64
set_unless[:apache][:worker][:maxrequestsperchild] = 0
-# whee, vendor merge.

Check this picture of my local git history out - it shows going from
the 0.9.1 version of the apache2 cookbook to 0.10.1, with a local
change in the middle:

http://skitch.com/adamjacob/n6246/chef-repo-branch-master

Check out this gist for some in-action shots:

In order to make this complete, a few things still need to be done:

  1. We need to enable the API for uploading to the cookbook site
    easily, to make it simple for cookbook authors to publish their
    cookbooks.
  2. We need to enable tracking of the upstream source repository in the
    cookbook site, along with an optional path, which should enable you to
    both vendor HEAD and easily collaborate on development.

What do you think?

Adam


Opscode, Inc.
Adam Jacob, CTO
T: (206) 508-7449 E: adam@opscode.com