Configuring a Big Data Streaming Job using the Spark Streaming Framework

Before running your Job, you need to configure it to use your Amazon EMR cluster.

Procedure

Because your Job will run on Spark, it is necessary to add a tHDFSConfiguration component and then configure it to use the HDFS connection metadata from the repository.
In the Run view, click the Spark Configuration tab.
In the Cluster Version panel, configure your Job to user your cluster connection metadata.
Set the Batch size to 2000 ms.
Because you will set some advanced properties, change the Property type to Built-In.
In the Tuning panel, select the Set tuning properties option and configure the fields as follows.
Run your Job.

It takes a couple of minutes to have data displayed in the Console.

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!