Reading streaming messages from a Google Pub/Sub topic

This scenario aims at helping you set up and use connectors in a pipeline. You are advised to adapt it to your environment and use case.

Example of a pipeline created from the instructions below.

About this task

This scenario processes streaming JSON message data about books published in a Google Pub/Sub topic.

Procedure

Click Connections > Add connection.
In the panel that opens, select the type of connection you want to create.
Example
Google Pub/Sub
Select your engine in the Engine list.
Information noteNote:
- It is recommended to use the Remote Engine Gen2 rather than the Cloud Engine for Design for advanced processing of data.
- If no Remote Engine Gen2 has been created from Talend Management Console or if it exists but appears as unavailable which means it is not up and running, you will not be able to select a Connection type in the list nor to save the new connection.
- The list of available connection types depends on the engine you have selected.
Select the type of connection you want to create.
Here, select Google Pub/Sub.
Fill in the connection properties to access your Google project as described in Google Pub/Sub properties, including your project name and JSON credentials, check the connection and click Add dataset.
In the Add a new dataset panel, name your dataset book prices.
Select Google Pub/Sub in the connection list.
Fill in the required properties to access the file located in your Pub/Sub topic (topic name, subscription name, data format) and click View sample to see a preview of your dataset sample.
Click Validate to save your dataset.
Do the same to add a Test connection and dataset that will be used as a destination in your pipeline.
Click Add pipeline on the Pipelines page. Your new pipeline opens.
Click ADD SOURCE to open the panel allowing you to select your source data, here the JSON messages published to Pub/Sub.
Select your dataset and click Select in order to add it to the pipeline.
Rename it if needed.
Click and add a Window processor to the pipeline. The configuration panel opens.
Give a meaningful name to the processor.
Example
5sec window
In the Configuration tab:
1. Enable the Use Window session toggle.
2. Type in 5000 as the window duration in order to capture data every 5 seconds.
Click Save to save your configuration.
Click ADD DESTINATION and select the test dataset that will hold your reorganized data.
Rename it if needed.
In the Configuration tab, enable the Log records to STDOUT toggle as you want to store the output logs.
Click Save to save your configuration.
On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the messages published to the Pub/Sub topic are being retrieved every 5 seconds and can be seen in the output logs. You can refresh the Metrics view in the Pipeline Details panel to see the number of records being incrementally updated.

Output log with 6 records produced for a 256 bytes metrics.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here