Spark kafka streaming deployment on aws

Hello board,

Newbie here, I am trying to deploy a streaming service on aws and need some guidance. First, I want to generate streams using kafka, consume the streams using Spark and do some back-end processing on this. My questions is two fold.

  1. Is there any advantage in using chef for such deployment?
  2. Is it okay to just write a wrapper and include the kafka, spark, spark streaming and backend processing services or should I be thinking about the problem differently.


I would personally use chef to configure Spark, Zookeeper, and Kafka, it’s
up to you how you wish to ensure that your streaming job stays up forever,
and how to restart your streaming job. Previously, I’d done this via CRON
to check if the streaming job was up, and if not, to submit it.

Thanks for the reply @Dennis_Lovely. Do u have a cookbook for spark streaming or a wrapper for deploying the different applications? I am currently using the cerner_kafka, apache_spark and zookeeper cookbooks on supermarket.

I have moved away from a spark cluster, and just using Mesos for
distributing my spark workloads, and Marathon/Chronos to schedule them, but
when I still used spark cluster, I’d make templates in my cookbook for my
spark streaming jobs and then just submit them to my spark master using
spark-submit. Likely a wrapper is the best solution.