Loading only new table rows from JDBC sources
For JDBC loads, customers may need to load only those rows introduced into the source database table since the last load into Qlik Catalog. A new entity property, src.table.new.records.filter.column, allows a customer to specify a table column that is used to identify new rows. Qlik Catalog will automatically build a WHERE-clause and append it to the SQL SELECT statement specified in property src.file.glob.
Logically, this column should be an ever-increasing numeric ID column, found in many tables. A timestamp column like creation_ttz could be used if it was guaranteed to always increase and be unique—duplicate timestamps would be problematic.
Qlik Catalog uses the filter column to build a distribution database profile table query to find the maximum value of the column—once built, the query looks like:
SELECT metric_value FROM "podium_core_func"."x_water_mark_profile"
WHERE field_name = 'myก:กid' AND profile_metric = 'PROFILER_MAX_OBSERVED_VALUE'
The resulting value is used to automatically append a WHERE clause to the src.file.glob value which is used to retrieve data from the source table— like in this example:
SELECT "myก:กid","textdata" FROM "podium_core"."X_WATER_MARK" WHERE "myก:กid" > 2
Each time a load is executed this process is repeated; thereby ensuring only new rows are read from the source database table. To ensure consistent load behavior, the property src.table.new.records.filter.column should be set immediately after the entity is created and no later than before the second load is run.
There are several ways to configure this feature in order to best serve your specific use case. You may want the default behavior where only the new rows are profiled and available to Qlik Sense apps.
By default, each load is placed in a new partition (where NEW is selected when initiating a load). When querying using Query in the Discover module, or when viewing Profile data, only the newly discovered records are shown and profiled. A query issued is similar to the record selection that occurs during Publish to Qlik Sense. Therefore, only the newly discovered records are available to Qlik Sense apps.
There are two other logical options:
- Query sees all records, Profile data does not: Edit the entity and change it from "Snapshot" to "Incremental" (and then perform a load to apply) – this changes how the distribution database view is built upon each load. A query issued in the Discover module will now see all the records across all the loads, as will a Qlik Sense app. However, Profile data will still only include the most recent new records.
Query and Profile data capture all loaded data: A final option is to always append new load data into one single partition (select APPEND when initiating all loads after the first). Queries issued in the Discover module, as well as Profile data, will now span all loaded data, as each batch of new rows is placed into a single pre-existing partition.