Skip to main content Skip to complementary content

Registering data that is already on the data platform

You can register data that already exists on the data platform to curate and transform data, and create data marts. This lets you use data that is onboarded with other tools than Qlik Cloud Data Integration, for example, Qlik Replicate, or Stitch.

When you register data, two data tasks are created.

  • Registered data

    Registering the data involves creating views to prepare the data to be ready for creating datasets.

  • Storage

    This involves generating and storing datasets based on the registered data.

    Storing datasets

When you have registered the data, you can use the generated datasets in several ways.

  • You can use the datasets in an analytics app.

  • You can create transformations.

  • You can create a data mart.

Register data

You can register data that exists in the cloud data warehouse defined in the data project. The generated datasets will be stored in the same cloud data warehouse.

For more information about data projects, see Creating a data pipeline.

  1. Click Add new and then Register data.

  2. Add Name and Description for the data task.

    Click Next.

  3. Select data to register.

    Selecting data to include

    Click Next.

    Settings is displayed.

  4. Select how the data is updated in Update method.

    Select Incremental using high watermark if the data is replicated by Qlik Replicate or Stitch.

    • Use Incremental using high watermark to process data changes incrementally using a high watermark pattern. This is the suggested method if the data is replicated by Qlik Replicate (with Full load and store changes enabled) or Stitch.

      For more information, see Update method.

    • Use Compare with current storage when the data was only loaded once, or if it is updated using full reloads.

  5. Preview the two data tasks that are created in Summary, and rename them if you prefer.

    Tip noteThe names are used when naming database schemas in the storage data task. As a schema can only be associated with one task, consider using names that are unique to avoid conflicts with data tasks in other data projects using the same data platform.
  6. Select if you want to open the registered data task, or return to the data project.

    When you are ready, Click Finish.

The two data tasks are now created. To start replicating data you need to:

  • Prepare the registered data task.

    Click Prepare in the data task.

    When artifacts have been created, the data task status is Registered.

  • Prepare and run the storage data task.

    For more information, see Storing datasets

Selecting data to include

When you select data to include, you can select specific tables or views, or use selection rules to include or exclude groups of tables.

Use % as a wildcard to define a selection criteria for schemas and tables.

  • %.% defines all tables in all schemas.

  • Public.% defines all tables in the schema Public.

Selection criteria gives you a preview based on your selections.

You can now either:

  • Create a rule to include or exclude a group of tables based on the selection criteria.

    Click Add rule from selection criteria to create a rule, and select either Include or Exclude.

    You can see the rule under Selection rules.

  • Select one or more datasets, and click Add selected datasets.

    You can see the added datasets under Explicitly selected datasets.

Selection rules only apply to the current set of tables and views, not to tables and views that are added in the future.

Registered data settings

You can set properties for the registered data task.

  • Click Settings.

General

  • Database

    Database to use in the target.

  • Data task schema

    You can change the name of the data task schema.

  • Prefix for all tables and views

    You can set a prefix for all tables and views created with this task.

    Information noteYou must use a unique prefix when you want to use a database schema in several data tasks.

Update method

Change detection

  • Use Compare with current storage when the data was only loaded once, or if it is updated using full reloads.

  • Use Incremental using high watermark to process data changes incrementally using the high watermark method.

    This option requires that all tables have a primary key defined. You can define a primary key manually in the Datasets view for tables that are missing a primary key.

Incremental load settings

These settings are available when Incremental using high watermark is selected.

  • If the data is replicated by a Qlik Replicate task with Full load and store changes, set Incremental load settings to Qlik Replicate settings.

  • If the data is replicated by a Stitch data pipeline, and your source tables have a primary key defined, set Incremental load settings to Stitch default settings.

  • Otherwise, set Incremental load settings to Custom and define the settings yourself.

Incremental load settings
Setting Custom Qlik Replicate settings Stitch default settings
Change tables

If the changes are in the same table, select Changes are within the same table.

If not, deselect Changes are within the same table and specify a change table pattern in Change table pattern.

${SOURCE_TABLE_NAME}__ct table Changes are within the same table
Watermark column Set the name of the watermark column in Name. header__change_seq _SDC_BATCHED_AT
"From date" column

You can indicate the "From date" by the batch start time, or using a selected column.

If you select Selected "From date" column, you must define a "From date" pattern.

header__timestamp _SDC_BATCHED_AT

You can change this to indicate "From date" by the batch start time, or by selecting a different column.

Soft deletions

You can include soft deletions in changes by selecting Changes include soft deletions and defining an indication expression.

The indication expression should evaluate to True if the change is a soft delete.

Example: ${is_deleted} = 1

${header__change_oper} = 'D'

You can include soft deletions in changes by selecting Changes include soft deletions and defining an indication expression.

The indication expression should evaluate to True if the change is a soft delete.

Example: ${is_deleted} = 1

Before image

You can filter out before image records in change tables changes by selecting Before image and defining an indication expression.

The indication expression should evaluate to True if the row contains the image before the update.

Example: ${header__change_oper} = 'B'

${header__change_oper} = 'B' There are no before image records in the data.

Recommended Qlik Replicate configuration

These Qlik Replicate task settings are recommended when registering data that is replicated using a Qlik Replicate task storing changes.

  • The Qlik Replicate task should be configured with options Full Load and Store Changes.

  • In Store Changes Settings > Change Tables, make sure that the following change table columns are included, using their default names:

    • [header__]change_seq

    • [header__]change_oper

    • [header__] timestamp

  • In Store Changes Settings > Change Tables, set On UPDATE to Store after image only.

    This reduces the space for each update as the before image is not included. Use this option if you do not plan to use the before image.

  • In Store Changes Settings > Change Tables, set Suffix to the default value __ct.

  • Do not apply the following global transformations:

    • Rename Change Table

    • Rename Change Table Schema

  • If a primary key in a source table can be updated, enable DELETE and INSERT when updating a primary key column option in Change Processing Tuning.

    History of the old record will not be preserved in the new record.

    Information noteThis option is supported from Qlik Replicate November 2022 .

Operations on the registered data task

You can perform the following operations on a registered data task from the task menu.

  • Open

    This opens the data task. You can view the table structure and details about the data task.

  • Edit

    You can edit the name and the description of the task, and add tags.

  • Delete

    You can delete the data task.

    The source data is not deleted.

  • Sync datasets

    This syncs design changes that cannot be automatically adjusted.

  • Recreate tables

    This recreates the datasets from the source.

    You must also recreate all downstream data tasks that consume this data task.

  • Store data

    You can create a storage data task that uses data from this landing data task.

History considerations when setting a "From date" column

If historical data is enabled in a downstream task, and you use a "From date" column, backdating is not supported. This means that if a change batch contains an older version of a record that does not exist in storage, the change batch must also include all newer versions of the record. If the newer versions are not included, they will be deleted.

In these examples, storage contains these records from the start:

From date Name City
2/Oct/2023 Joe New York
3/Oct/2023 Joe London

Example 1:  

If you insert the following change batch:

From date Name City
4/Oct/2023 Joe Paris

The result in storage is, as expected:

From date Name City
2/Oct/2023 Joe New York
3/Oct/2023 Joe London
4/Oct/2023 Joe Paris

Example 2:  

But if you insert the following older record in a change batch:

From date Name City
1/Oct/2023 Joe Berlin

This results in the newer records being removed in storage:

From date Name City
1/Oct/2023 Joe Berlin

Example 3:  

To maintain the history, the change batch must include the newer records:

From date Name City
1/Oct/2023 Joe Berlin
2/Oct/2023 Joe New York
3/Oct/2023 Joe London

This will ensure the history is maintained in storage as well:

From date Name City
1/Oct/2023 Joe Berlin
2/Oct/2023 Joe New York
3/Oct/2023 Joe London

Considerations

  • Do not use the history option in the Stitch replication. Use the options to keep historical data in Qlik Cloud Data Integration.

Data capacity considerations

  • If a registered table has no primary key, a full reload will be executed for every run. This will be counted towards your monthly registered data capacity quota. This is because the storage will need to compare all records to find changes.

  • Data capacity for registered data is counted in the storage. This means a delete in the registered data is translated to an insert or update of the storage (a soft delete) and counted in the data capacity.

  • Soft deletes, inserts, and updates will be counted twice towards data capacity if a table from registered data is used in two storage data tasks.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!