Overview
The following topic provides an overview of the Google Cloud Pub/Sub endpoint.
Topic handling and message ordering
In a task configured with the Google Cloud Pub/Sub endpoint, Replicate acts as the Publisher. Messages can be published to separate topics for each table or to a single topic according to the endpoint settings. When messages are published, regardless of whether they are published to separate topics or to a single topic, Replicate creates a default subscription for each topic it creates if no subscription exists. The default subscription name is the topic name appended with the -sub suffix (for example mytopic-sub). Additionally, the ability to drop a topic (and its associated subscription) is only relevant when messages are published to separate topics and only if the following task settings are configured as follows:
- Full Load setting: If target table already exists: DROP and CREATE table
- Apply Changes setting: When source table is dropped: DROP target table
The endpoint provides a Keep message changes in order option which ensures that messages are published in the order they occurred. It also ensures that consumers receive the messages in the correct order by enabling the Order messages with an ordering key option when Replicate creates a default subscription. However, as mentioned above, Replicate only creates a default subscription if no subscription exists. Customers can of course create their own topics and subscriptions, in which case they will need to enable the Order messages with an ordering key option (and make sure the endpoint's Keep message changes in order option is enabled) in order to receive the messages in the correct order. For more information, see Ordering messages. Note that Replicate turns on this option when it automatically creates the subscription (see above).
Transaction processing from a consumer perspective
When configuring the Google Cloud Pub/Sub endpoint, users can determine where messages are published.
During a task's CDC stage, committed changes that are detected by the Qlik Replicate source endpoint are grouped by transaction, sorted internally in chronological order, and then propagated to the relevant topics.
Each CDC message has both a Transaction ID as well as change sequence. As the change sequence is a monotonically growing number, sorting events by change sequence always achieves chronological order. Grouping the sorted events by Transaction ID then results in transactions containing chronologically sorted changes.
However, as Google Cloud Pub/Sub is a messaging infrastructure, applying changes is not feasible. The Google Cloud Pub/Sub endpoint, therefore, takes a different approach, which is to report all transactional events as messages.
If maintaining transaction consistency is important for the consumer implementation, it means that although the Transaction ID exists in all data messages, the challenge is to gather the messages in a way that facilitates identifying a whole transaction. An additional challenge is getting the transaction in the original order they were committed, which could be an even greater challenge if transactions are spread across multiple topics.
The simplest way of achieving the above goal is to direct Replicate to a specific topic (in the endpoint settings). This means that all data messages will end up in a single topic, thus guaranteeing ordered delivery both of transactions and of changes within a transaction. The consuming application could then consume messages - accumulating a transaction in some intermediate memory buffer - and when a new Transaction ID is detected, mark the previous transaction as completed.