Reading streaming messages from a Google Pub/Sub topic
This scenario aims at helping you set up and use connectors in a pipeline. You are
advised to adapt it to your environment and use case.
About this task
This scenario processes streaming JSON message data about books published in a Google
Pub/Sub topic.
Procedure
Click Connections > Add
connection.
In the panel that opens, select the type of connection you
want to create.
Example
Google
Pub/Sub
Select your engine
in the Engine list.
Information noteNote:
It is recommended to use the Remote Engine Gen2 rather than
the Cloud Engine for Design for advanced
processing of data.
If no Remote Engine Gen2 has been created from Talend Management Console or if it exists but appears as unavailable
which means it is not up and running, you will not be able to select
a Connection type in the list nor to
save the new connection.
The list of available connection types depends on the engine you
have selected.
Select the type of connection you want to create.
Here, select Google
Pub/Sub.
Fill in the connection properties to access your Google project as described in
Google Pub/Sub properties, including your project name and
JSON credentials, check the connection and click Add
dataset.
In the Add a new dataset
panel, name your dataset book
prices.
Select Google Pub/Sub in
the connection list.
Fill in the required properties to access the file located in your Pub/Sub topic
(topic name, subscription name, data format) and click View
sample to see a preview of your dataset sample.
Click Validate to save your dataset.
Do the same to add a Test connection and dataset that will be used as a destination
in your pipeline.
Click Add
pipeline on the Pipelines page. Your new pipeline opens.
Click ADD SOURCE to
open the panel allowing you to select your source data, here the JSON messages
published to Pub/Sub.
Select your dataset and click
Select in order to add it to the pipeline.
Rename it if needed.
Click and add a Window processor to the pipeline. The
configuration panel opens.
Give a meaningful name to the processor.
Example
5sec window
In the Configuration
tab:
Enable the Use Window
session toggle.
Type in 5000 as the
window duration in order to capture data every 5 seconds.
Click Save to
save your configuration.
Click ADD DESTINATION
and select the test dataset that will hold your reorganized data.
Rename it if needed.
In the Configuration
tab, enable the Log records to STDOUT
toggle as you want to store the output logs.
Click Save to
save your configuration.
On the top toolbar of Talend Cloud Pipeline Designer,
click the Run button to open the panel allowing you to select
your run profile.
Select your run profile in the list (for more information, see Run profiles), then click Run to
run your pipeline.
Results
Your pipeline is being executed, the messages published to the Pub/Sub topic are being
retrieved every 5 seconds and can be seen in the output logs. You can refresh the
Metrics view in the Pipeline Details
panel to see the number of records being incrementally updated.
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!