Skip to main content Skip to complementary content

Writing data to an Amazon Kinesis Stream

Before you begin

In this section, it is assumed that you have an Amazon EMR cluster up and running and that you have created the corresponding cluster connection metadata in the repository. It is also assumed that you have created an Amazon Kinesis stream.

Procedure

  1. Create a Big Data Streaming Job using the Spark framework.
  2. In this example the data, which will be written to Amazon Kinesis, are generated with a tRowGenerator component.
  3. The data must be serialized as byte arrays before being written to the Amazon Kinesis stream. Add a tWriteDelimitedFields component and connect it to the tRowGenerator component.
  4. Configure the Output type to byte[].
  5. To write the data to your Kinesis stream, add a tKinesisOutput component and connect the tWriteDelimitedFields component to it.
  6. Provide your Amazon credentials.
  7. To access your Kinesis stream, provide the Stream name and the corresponding endpoint url.

    To get the right endpoint url, refer to AWS Regions and Endpoints.

  8. Provide the number of shards, as specified when you created the Kinesis stream.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!