Hotwire Tech Blog

Scribes from Hotwire Engineering


We will look into running Jobs on Spark cluster and configuring the settings to fine tune a simple example to achieve significantly lower runtimes. We will also allude to the trade off between setting number of tasks per executors and number of executors per node given a cluster node configuration.

Spark Cluster Abstraction

This diagram (source) shows the key components of the cluster. Spark driver is the program you write. The Driver program demands for executors from the Master (the Worker nodes lets the Master know how much resource they each have). The Master allocates the resources…