Skip to main content Skip to complementary content

Sending data to a Kafka topic

This scenario aims at helping you set up and use connectors in a pipeline. You are advised to adapt it to your environment and use case.

Example of a pipeline created from the instructions below.

Before you begin

Procedure

  1. Click Connections > Add connection.
  2. Add a Test connection then click Add dataset.
  3. Select your engine in the Engine list.
    Information noteNote:
    • It is recommended to use the Remote Engine Gen2 rather than the Cloud Engine for Design for advanced processing of data.
    • If no Remote Engine Gen2 has been created from Talend Management Console or if it exists but appears as unavailable which means it is not up and running, you will not be able to select a Connection type in the list nor to save the new connection.
    • The list of available connection types depends on the engine you have selected.
  4. Select JSON in the format list and paste the content of the test-file-to-kafka.json file in the Values field.
  5. Name it, for example action movies and save it.
  6. Do the same to add a connection to a Kafka server:
    1. Click Connections > Add connection.
    2. In the panel that opens, give a name to your connection as well as a description if needed.

      Example

      Kafka
    3. Select the type of connection you want to create.
      Here, select Kafka.
    4. Fill in the connection properties to safely access your Kafka server as described in Kafka properties, check the connection and click Add dataset.
  7. In the Add a new dataset panel, name your dataset. In this example, the collette_movies_json topic will be used to publish the data about movies.

    Example

    Configuration of a new Kafka dataset.
  8. Name your dataset, Collette kafka topic for example.
  9. Click Validate to save your dataset.
  10. Click Add pipeline on the Pipelines page. Your new pipeline opens.
  11. Give the pipeline a meaningful name.

    Example

    From Test to Kafka - send to Kafka topic
  12. Click ADD SOURCE and select your source dataset, action movies in the panel that opens.
  13. Click add processor and add a Split processor to the pipeline in order to split the records that contain both actor first names and last names. The configuration panel opens.
  14. Give a meaningful name to the processor.

    Example

    split actor names
  15. Configure the processor:
    1. Select Split text in parts in the Function name list as you want to split the values corresponding to name records.
    2. Select .detail.starring in the Fields to process list as you want to apply this change to the values of these specific records.
    3. Enter or select 2 in the Parts list as you want to split the values of these specific records in two parts.
    4. Select Space in the Separator list as first names and last names are separated by a space in these records.
  16. Click Save to save your configuration.
  17. (Optional) Look at the preview of the processor to see the data after the split operation.
    In the output data preview, the detail starring column is split in two, one for the first name and the other for the last name.
  18. Click add processor and add a Filter processor to the pipeline. The configuration panel opens.
  19. Give a meaningful name to the processor.

    Example

    filter on movies with actor Collette
  20. Configure the processor:
    1. Add a new element and select .detail.starring_split_2 in the Input list, as you want to filter on the last names of the actors listed in the dataset.
    2. Select None in the Optionally select a function to apply list.
    3. Select == in the Operator list.
    4. Enter Collette in the Value field, as you want to filter on data that contains the name Collette.
    5. Click Save to save your configuration.
  21. (Optional) Look at the preview of the Filter processor to see your data sample after the filtering operation.

    Example

    In the Output data preview, three records match the criteria.
  22. Click the ADD DESTINATION item on the pipeline to open the panel allowing to select the Apache Kafka topic in which your output data will be loaded, Collette Kafka topic.
  23. In the Configuration tab of the destination, the Round-Robin model is the default Partition Type used when publishing an event but feel free to specify a partition key according to your case.
  24. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  25. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the movie data from your test file has been processed and the output flow is sent to the collette_movies_json topic you have defined.

What to do next

Once the data is published, you can consume the content of the topic in another pipeline and use it as a source:

A new pipeline where the source is the collette kafka topic from the previous destination pipeline.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!