Setting up service with huge data files?


#1

So I want to provision a service for a custom app we’ve got. I can template an initscript/launchd_plist, and put all my listening ports, identifying strings etc. into data bags, etc… but until I’ve got the datafile on disk I can’t start the service.

The datafile will be anywhere from 100GB to 2TB - far too big to provision during a chef run, and since it’s being seeded from an existing host I won’t have it in advance anyway.

What’s the best way to proceed with something like this?

My current best-thinking is to write a recipe with a data bag attribute “filename to look for”, such that if the datafile doesn’t exit the recipe doesn’t do certain things until it finds the file…

Suggestions?

–cmd


Christopher DeMarco cdemarco@gmail.com
+1-412-708-9660


#2

There’s a cookbook for distributing files via BitTorrent that I’d
suggest you use for shipping around such an amazingly large binary
asset; alternatives may be S3, rsync, etc.

–AJ

On 24 September 2012 13:07, Christopher DeMarco cdemarco@gmail.com wrote:

So I want to provision a service for a custom app we’ve got. I can template
an initscript/launchd_plist, and put all my listening ports, identifying
strings etc. into data bags, etc… but until I’ve got the datafile on disk I
can’t start the service.

The datafile will be anywhere from 100GB to 2TB - far too big to provision
during a chef run, and since it’s being seeded from an existing host I won’t
have it in advance anyway.

What’s the best way to proceed with something like this?

My current best-thinking is to write a recipe with a data bag attribute
"filename to look for", such that if the datafile doesn’t exit the recipe
doesn’t do certain things until it finds the file…

Suggestions?

–cmd


Christopher DeMarco cdemarco@gmail.com
+1-412-708-9660


#3

Here we go: http://community.opscode.com/cookbooks/bittorrent

Cheers,

–AJ

On 24 September 2012 13:21, AJ Christensen aj@junglist.gen.nz wrote:

There’s a cookbook for distributing files via BitTorrent that I’d
suggest you use for shipping around such an amazingly large binary
asset; alternatives may be S3, rsync, etc.

–AJ

On 24 September 2012 13:07, Christopher DeMarco cdemarco@gmail.com wrote:

So I want to provision a service for a custom app we’ve got. I can template
an initscript/launchd_plist, and put all my listening ports, identifying
strings etc. into data bags, etc… but until I’ve got the datafile on disk I
can’t start the service.

The datafile will be anywhere from 100GB to 2TB - far too big to provision
during a chef run, and since it’s being seeded from an existing host I won’t
have it in advance anyway.

What’s the best way to proceed with something like this?

My current best-thinking is to write a recipe with a data bag attribute
"filename to look for", such that if the datafile doesn’t exit the recipe
doesn’t do certain things until it finds the file…

Suggestions?

–cmd


Christopher DeMarco cdemarco@gmail.com
+1-412-708-9660


#4

Assuming that I don’t want to distribute the file via Chef, how would I best divide the recipe into bits that could run before the file arrives and bits that can’t run until afterwards?

And if I did want to use that BT or some other custom method to provision such a huge file, does a recipe (or indeed an entire chef-client run!) block on that file transfer? Or can I make chef do stuff / sleep while it’s “waiting” for the huge file to arrive?

On Sep 23, 2012, at 9:21 PM, AJ Christensen aj@junglist.gen.nz wrote:

There’s a cookbook for distributing files via BitTorrent that I’d
suggest you use for shipping around such an amazingly large binary
asset; alternatives may be S3, rsync, etc.

–AJ

On 24 September 2012 13:07, Christopher DeMarco cdemarco@gmail.com wrote:

So I want to provision a service for a custom app we’ve got. I can template
an initscript/launchd_plist, and put all my listening ports, identifying
strings etc. into data bags, etc… but until I’ve got the datafile on disk I
can’t start the service.

The datafile will be anywhere from 100GB to 2TB - far too big to provision
during a chef run, and since it’s being seeded from an existing host I won’t
have it in advance anyway.

What’s the best way to proceed with something like this?

My current best-thinking is to write a recipe with a data bag attribute
"filename to look for", such that if the datafile doesn’t exit the recipe
doesn’t do certain things until it finds the file…

Suggestions?

–cmd


Christopher DeMarco cdemarco@gmail.com
+1-412-708-9660


Christopher DeMarco cdemarco@gmail.com
+1-412-708-9660


#5

try something like this in the compile phase : https://gist.github.com/3773868

On Sun, Sep 23, 2012 at 9:40 PM, Christopher DeMarco cdemarco@gmail.com wrote:

Assuming that I don’t want to distribute the file via Chef, how would I best divide the recipe into bits that could run before the file arrives and bits that can’t run until afterwards?

And if I did want to use that BT or some other custom method to provision such a huge file, does a recipe (or indeed an entire chef-client run!) block on that file transfer? Or can I make chef do stuff / sleep while it’s “waiting” for the huge file to arrive?

On Sep 23, 2012, at 9:21 PM, AJ Christensen aj@junglist.gen.nz wrote:

There’s a cookbook for distributing files via BitTorrent that I’d
suggest you use for shipping around such an amazingly large binary
asset; alternatives may be S3, rsync, etc.

–AJ

On 24 September 2012 13:07, Christopher DeMarco cdemarco@gmail.com wrote:

So I want to provision a service for a custom app we’ve got. I can template
an initscript/launchd_plist, and put all my listening ports, identifying
strings etc. into data bags, etc… but until I’ve got the datafile on disk I
can’t start the service.

The datafile will be anywhere from 100GB to 2TB - far too big to provision
during a chef run, and since it’s being seeded from an existing host I won’t
have it in advance anyway.

What’s the best way to proceed with something like this?

My current best-thinking is to write a recipe with a data bag attribute
"filename to look for", such that if the datafile doesn’t exit the recipe
doesn’t do certain things until it finds the file…

Suggestions?

–cmd


Christopher DeMarco cdemarco@gmail.com
+1-412-708-9660


Christopher DeMarco cdemarco@gmail.com
+1-412-708-9660


#6

May also be worth considering Heavy Water’s Runlist Modifiers cookbook
[0], which temporarily allows you to restrict/allow recipes. This is
useful in the case where you want to model your convergence (including
a large, blocking file transfer used for deployment) but leave that
recipe restricted the rest of the time. Other recipes that include a
restricted recipe will also be skipped during regular convergence.

Chris Roberts has a blog post on the implementation details/usage here
[1], original mailing list discussion [2]

Cheers,

–AJ

[0] http://community.opscode.com/cookbooks/runlist_modifiers
[1] http://code.chrisroberts.org/blog/2012/05/09/cooking-up-partial-run-lists-with-chef/
[2] http://lists.opscode.com/sympa/arc/chef-dev/2012-03/msg00022.html

On 24 September 2012 14:25, Sean OMeara someara@gmail.com wrote:

try something like this in the compile phase : https://gist.github.com/3773868

On Sun, Sep 23, 2012 at 9:40 PM, Christopher DeMarco cdemarco@gmail.com wrote:

Assuming that I don’t want to distribute the file via Chef, how would I best divide the recipe into bits that could run before the file arrives and bits that can’t run until afterwards?

And if I did want to use that BT or some other custom method to provision such a huge file, does a recipe (or indeed an entire chef-client run!) block on that file transfer? Or can I make chef do stuff / sleep while it’s “waiting” for the huge file to arrive?

On Sep 23, 2012, at 9:21 PM, AJ Christensen aj@junglist.gen.nz wrote:

There’s a cookbook for distributing files via BitTorrent that I’d
suggest you use for shipping around such an amazingly large binary
asset; alternatives may be S3, rsync, etc.

–AJ

On 24 September 2012 13:07, Christopher DeMarco cdemarco@gmail.com wrote:

So I want to provision a service for a custom app we’ve got. I can template
an initscript/launchd_plist, and put all my listening ports, identifying
strings etc. into data bags, etc… but until I’ve got the datafile on disk I
can’t start the service.

The datafile will be anywhere from 100GB to 2TB - far too big to provision
during a chef run, and since it’s being seeded from an existing host I won’t
have it in advance anyway.

What’s the best way to proceed with something like this?

My current best-thinking is to write a recipe with a data bag attribute
"filename to look for", such that if the datafile doesn’t exit the recipe
doesn’t do certain things until it finds the file…

Suggestions?

–cmd


Christopher DeMarco cdemarco@gmail.com
+1-412-708-9660


Christopher DeMarco cdemarco@gmail.com
+1-412-708-9660