Monitoring an individual data task
You can monitor the status and progress of your data tasks by selecting Monitor from the drop-down menu in the top left of the data task window.
You can also create monitor views to monitor several data tasks. For more information, see Monitoring and operating your data tasks.
Monitoring landing and replication tasks
Monitoring of landing and replication tasks is similar in that both the initial load and CDC can be monitored, but there are some notable differences due to their contrasting use cases.
-
Data pipeline use case: Landing tasks
All landing tasks must start with a full load of the source data to the target. Once the initial full load completes, the target data is updated with changes to the source data. This can either be done using Reload and compare or Change data capture (CDC) according to the task definition.
For more information, on landing tasks, see Landing data from data sources.
-
Replication use case: "Replicate data" tasks
"Replicate data" tasks usually start with a full load of the source data to the target. This is required when replicating from SaaS applications, but is optional when replicating from databases. When replicating from databases, if the source data already exists on the target and you only wish to apply the source changes to the target (or store them for applying later), then the replication mode can be Apply changes, Store changes or both. Both of these replication modes are shown in CDC monitoring.
For more information, on "Replicate data" tasks, see the following topics:
Replicating data with a Standard, Premium, or Enterprise subscription
Replicating data with a Qlik Talend Cloud Starter subscription
-
Replication use case: "Land data in data lake" tasks
"Land data in data lake" tasks are similar to landing tasks in that they must start with a full load. Once the initial full load completes, the target data is updated with changes to the source data. This can either be done using Reload or Change data capture (CDC). Despite their similarity to landing tasks, "Land data in data lake" tasks are considered replication tasks as they consist of source-to-target replication only. They do not offer the possibility of manipulating the data further downstream (for example, using transformations and data marts), which is available in a data pipeline.
Information noteThe steps for creating a separate "Land data in data lake" task are not relevant with aQlik Talend Cloud Starter subscription. With a Qlik Talend Cloud Starter subscription, replication to cloud storage targets is done via a standard "Replicate data" task.For more information on "Land data in data lake" tasks, see Landing data in a data lake with a Standard, Premium, or Enterprise subscription.
Monitoring details
You can view the following details for the data task in Full load status:
-
Queued- the number of tables currently queued.
-
Loading - the number of tables currently being loaded.
-
Completed- the number of tables completed.
-
Error - the number of tables in error.
You can view the following details for each table in the data task:
-
Name
The name of the target table.
-
State
Table state will be either: Queued, Loading, Completed, or Error.
-
Started
The time that loading started.
-
Ended
The time that loading ended.
-
Duration
Duration of the load in format hh:mm:ss.
-
Records
The number of records that were replicated during the load.
-
Cached changes
The number of cached changes.
-
Message
Displays error message if the load was not processed successfully.
Change data capture (CDC) monitoring details
You can view the following CDC details for the data task to monitor change processing in CDC status:
-
Incoming changes- the number of changes present at the source and waiting to be processed. You can view how many that are accumulated, and how many that are being applied.
-
Processed changes- the number of changes that have been processed and applied (in the last 24 hours).
-
Throughput- average target throughput in Kilobytes/second. This indicates how fast the change records are loaded to the target endpoint.
-
Latency- current latency of the data asset (hh:mm:ss). This duration represents the time from when the change is available in the source until the change is applied and available in the target or landing asset.
You can view the following details for each table in the data task:
-
Name
The name of the target table in the landing asset.
-
State
Table state will be either: Accumulating changes or Error.
-
Last processed
The date and time when the last changes were made to the table.
-
Inserts
The number of insert operations.
-
Updates
The number of update operations.
Information noteUpdates are handled as inserts for SaaS application sources. -
Deletes
The number of delete operations.
-
DDL operations
The number of DDL operations
Information noteAvailable for "Replicate data" tasks only. -
Message
Displays error message if changes to the table fail and are not processed.
If you are landing data from an on-premises source and chose Full load mode , the tables will be automatically reloaded when the landing asset is Run.
If you are landing data from an on-premises source and chose Full load and CDC mode, the tables will be continuously updated with new data after the initial full load.
Reloading selected tables
You can manually reload selected tables from the source. This is useful when you want to recover single tables with error. Reloading tables will not affect the CDC timeline, which is reset if you use Recreate tables. Metadata changes are not propagated when reloading tables.
-
To reload selected tables, select the tables in the lower half of Monitor and click Reload tables.
You need the same permissions that are required to run the data task, that is, Owner or Can operate role.
Reload tables is available after the first run of the data task. If the update method is Reload and compare, Reload tables is not available when the data task is running.
Downstream storage data tasks will be synced the next time they run. If the storage task has history enabled, it will be maintained.
If it is not possible to recover by reloading tables, the next step is to repair the data task.
Reloading all tables to the target
You can reload all tables to the target if you experience CDC issues that cannot be resolved by reloading specific tables. Examples of issues are missing events, issues caused by source database reorganization, or failure when reading source database events.
- Stop the data task and all tasks that consume it.
-
Open the data task and select the Monitor tab.
-
Click ..., and then Reload target.
This will reload all tables to the target using Drop-Create, and will restart all change data capture from now.
-
Storage tasks that consume the landing data task will be reloaded via compare and apply at their next run to get in sync. Existing history will be kept. Type 2 history will be updated to reflect changes after the reload and compare process is executed.
The timestamp for the from date in the type 2 history will reflect the reload date, and not necessarily the date the change occurred in the source.
-
Storage live views will not be reliable during the reload target operation, and until the storage is in sync. Storage will be fully synced when:
-
All tables are reloaded using compare and apply,
-
One cycle of changes is performed for each table.
-
Monitoring storage, transform, and data mart tasks
You can monitor the status and progress of a Storage, Transform, or Data mart task.
-
When the first load is running you can view progress in Full load status.
-
While changes are processed you can view status and progress in Current batch of changes.
-
When changes have been processed you can view status and progress in Last batch of changes.
In the lower half of Monitor you can view status and progress for each dataset.
You can also view detailed information on SQL statement level.
Viewing status and progress
You can view the following details for each dataset or change:
-
State
This shows the current state of this dataset or change.
-
Completed - the load or change has completed successfully.
-
Loading - the table or change is being processed.
-
Queued - the table or change is waiting to be processed
-
Error - there was an error when processing the table or change.
-
-
Started
The time that loading or change processing started.
-
Ended
The time that loading or change processing ended.
-
Duration
Duration of loading or change processing in format hh:mm:ss.
-
Processed records
The number of records processed in the load or change.
-
Throughput (records/second)
Throughput is not updated until the load is finished.
-
Message
Displays error message if the load or change was not processed successfully.
The datasets will be continuously updated with new data as the landing area is updated by the replication task. Each batch relates to records from a certain time span. You can see the time span of the most recent batch in Last batch of changes.
Data from all source transactions up to the time shown in Data task is updated to is available for consumption from this data task. This information is available for a data task once all tables were loaded and the first set of changes applied. If you selected to generate live views, you can also view when live views are updated.
If there is a batch of changes before the initial load is completed, Data task is updated to will not be updated until the initial load is completed and the first batch of changes are applied. For example, assume that you are loading a data asset which contains an order dataset containing 1 million orders and an order details dataset containing 10 million order details. The datasets take 10 and 20 minutes to perform a full load, respectively. The order dataset is loaded first, followed by the order details dataset . While order dataset was loading, a new order was inserted. So when the order details are loaded, it may contain details of the new order, which does not yet exist in the order dataset . The order and order details datasets will only be in sync and fully updated to the same time after the first batch of changes is applied .
Viewing detailed information
You can view detailed information on SQL statement level.
-
Select Full load, Current batch of changes or Last batch of changes in the dropdown of the lower part of Monitor.
-
Select the datasets to monitor in detail.
-
Click Monitor details.
Monitor details is displayed, and you can view the commands that are executed for each step of the load or change process. You can click on a command to view the full SQL statements that were executed.
-
Click Export to CSV to export a text file with full SQL statements for all listed commands.
Data task is updated to for views
The Data task is updated to field shows the time to which the oldest view is updated.
-
Standard views
Data task is updated to shows the time to which the oldest standard view is updated.
For example, assume a task has two tables, Orders and Order details. Orders is updated to 10:01 with records from 10:00 and 10:01, and Order details has records from 10:00 only. In this case the data task is updated to 10:00. This should not be confused with the start and end times of the data task load, which could be 10:02 to 10:03.
-
Live views (Storage data tasks)
Data task is updated to shows the time to which the oldest live view is updated.
For example, assume a task has an Orders table. Orders in landing is updated to 10:01 with records from 10:00 and 10:01 ,but Orders in storage is updated to records from 10:00 only. In this case live views to Orders are updated to 10:01, and standard views are updated to 10:00.