Couchdb question

Why was couchdb chosen as the data store for chef-server?


Joe Van Dyk
http://fixieconsulting.com

Hi,

On 5/08/2009, at 7:10 AM, Joe Van Dyk wrote:

Why was couchdb chosen as the data store for chef-server?

I believe one of the main reasons for choosing CouchDB was the ability
to store a no-schema set of JSON objects while still providing
flexibility with the JS map/reduce engine.

As I'm sure you're aware, "Node Data" / "Attributes" are just Hashes,
Arrays, Strings (etc), which are #to_json'd into the Couch Database.
Shoehorning this ultimately flexible "document" into a RDBMS would
likely reduce (no pun intended) the functionality currently available.

As for "why was couchdb chosen".. we'll have to wait for the
benevolent dictator to chime in :slight_smile:

--
Joe Van Dyk
http://fixieconsulting.com

--
AJ Christensen, Software Engineer
Opscode, Inc.
E: aj@opscode.com

On Tue, Aug 4, 2009 at 12:10 PM, Joe Van Dykjoe@fixieconsulting.com wrote:

Why was couchdb chosen as the data store for chef-server?

Because it fit the model we needed very well. We needed it to be
lightweight, extensible, and schema-free. (Mapping Ohai data into a
relational database is not fun, and it doesn't scale well at all.) We
met most/all of the requirements for using CouchDB well - the data
changes with a fairly low frequency (even in the largest of
infrastructures updating constantly,) we know exactly how we want to
query it, and we don't need transactional consistency above the level
of a single object. Also, the data itself translates almost directly
to CouchDB - we just serialize objects as JSON, and store them.

It also is, in essence, just another HTTP application - which means
when you learn how to scale the rest of Chef, you are also learning
how to scale it's data store. Things like proxies, caches, etc. are
all right there, waiting to make both Chef and CouchDB faster in a
pretty trivial way.

Basically, we chose CouchDB because it was exactly the way we wanted
to relate to the data-store for this sort of data. Flexible, fast
(enough), queryable (enough) and trivial to scale for our use case.

Regards,
Adam

--
Opscode, Inc.
Adam Jacob, CTO
T: (206) 508-7449 E: adam@opscode.com

On 5/08/2009, at 3:15 PM, Adam Jacob wrote:

On Tue, Aug 4, 2009 at 12:10 PM, Joe Van
Dykjoe@fixieconsulting.com wrote:

Why was couchdb chosen as the data store for chef-server?

Because it fit the model we needed very well. We needed it to be
lightweight, extensible, and schema-free. (Mapping Ohai data into a
relational database is not fun, and it doesn't scale well at all.) We
met most/all of the requirements for using CouchDB well - the data
changes with a fairly low frequency (even in the largest of
infrastructures updating constantly,) we know exactly how we want to
query it, and we don't need transactional consistency above the level
of a single object. Also, the data itself translates almost directly
to CouchDB - we just serialize objects as JSON, and store them.

It also is, in essence, just another HTTP application - which means
when you learn how to scale the rest of Chef, you are also learning
how to scale it's data store. Things like proxies, caches, etc. are
all right there, waiting to make both Chef and CouchDB faster in a
pretty trivial way.

Basically, we chose CouchDB because it was exactly the way we wanted
to relate to the data-store for this sort of data. Flexible, fast
(enough), queryable (enough) and trivial to scale for our use case.

Thanks, friendly parliamentarian!

Regards,
Adam

--
Opscode, Inc.
Adam Jacob, CTO
T: (206) 508-7449 E: adam@opscode.com

--
AJ Christensen, Software Engineer
Opscode, Inc.
E: aj@opscode.com

On Tue, Aug 4, 2009 at 8:15 PM, Adam Jacobadam@opscode.com wrote:

On Tue, Aug 4, 2009 at 12:10 PM, Joe Van Dykjoe@fixieconsulting.com wrote:

Why was couchdb chosen as the data store for chef-server?

Because it fit the model we needed very well. We needed it to be
lightweight, extensible, and schema-free. (Mapping Ohai data into a
relational database is not fun, and it doesn't scale well at all.) We
met most/all of the requirements for using CouchDB well - the data
changes with a fairly low frequency (even in the largest of
infrastructures updating constantly,) we know exactly how we want to
query it, and we don't need transactional consistency above the level
of a single object. Also, the data itself translates almost directly
to CouchDB - we just serialize objects as JSON, and store them.

It also is, in essence, just another HTTP application - which means
when you learn how to scale the rest of Chef, you are also learning
how to scale it's data store. Things like proxies, caches, etc. are
all right there, waiting to make both Chef and CouchDB faster in a
pretty trivial way.

Basically, we chose CouchDB because it was exactly the way we wanted
to relate to the data-store for this sort of data. Flexible, fast
(enough), queryable (enough) and trivial to scale for our use case.

Ah, makes sense.

It probably wouldn't make sense to store the documents directly on the
filesystem, right? (in appropriately named directories)

Joe

On Thu, Aug 13, 2009 at 9:11 PM, Joe Van Dykjoe@fixieconsulting.com wrote:

Ah, makes sense.

It probably wouldn't make sense to store the documents directly on the
filesystem, right? (in appropriately named directories)

Sure, if you also want to write all the custom indexing code or do a
full inflation/scan of each every time you need to look at the data
differently. :wink:

Adam

--
Opscode, Inc.
Adam Jacob, CTO
T: (206) 508-7449 E: adam@opscode.com