Hotwire Tech Blog

Scribes from Hotwire Engineering

What’s the problem?

Most services handle large data amount. It can come from different places, such as Kafka, RabbitMQ, ActiveMQ. Large amount of data can slow down your application or even make it dead.

We know it, but what is the solution?

We need a realtime computation system focused on distributed processing of large data streams.

  • Apache Stormis a distributed stream processing computation…

    Read more...

Overview

We will look into running Jobs on Spark cluster and configuring the settings to fine tune a simple example to achieve significantly lower runtimes. We will also allude to the trade off between setting number of tasks per executors and number of executors per node given a cluster node configuration.

Spark Cluster Abstraction

This diagram (source) shows the key components of the cluster. Spark driver is the program you write. The Driver program demands for executors from the Master (the Worker nodes lets the Master know how much resource they each have). The Master allocates the resources…

Read more...