Skip to main content Skip to complementary content

Landing data from data sources

The first step of transferring data when onboarding is landing the data. This involves transferring the data continuously from the on-premises data source to a landing area.

You can land data from a number of data sources through source connections.

The landing area is defined when you create the data project.

  • Qlik Cloud (via Amazon S3)

    When you land data to Qlik Cloud (via Amazon S3), you can use it to generate QVD tables ready for analytics in Qlik Cloud.

  • Cloud data warehouse

    When you land data to a cloud data warehouse, such as Snowflake or Azure Synapse Analytics, you can store tables in the same cloud data warehouse.

Information noteData tasks operate in the context of their owner. For more information about required roles and permissions, see Data space roles and permissions.

Create and configure a landing data task

This describes how to create a landing data task. The quickest way to create a data pipeline is to onboard data which creates a landing data task and a storage data task, ready to prepare and run. For more information, see Onboarding data.

  1. Click Add new in Qlik Cloud Data Integration home, and select Land data.
  2. In the Land data dialog, enter a name and a description of the data task.

    Select Open to open the landing data task when it is created.

    Click Create.

  3. Click Select source data.

  4. Select a data connection to the source data and click Next.

    You can use the filters in the left panel, to filter the list of connections on source type, space, and owner.

    If you don't have a data connection to the source data yet, you need to create one first, by clicking Add connection.

    For more information about setting up a connection to the supported sources, see Connecting to data sources.

    Information noteWhen you have selected tables in the next step, it is not possible to change the source data connection from an on-premises data source to a cloud data source, or vise versa. You can only change the connection to another data source of the same type.
  5. Select tables and views to include in the data asset. The selection dialog is different depending on which type of source you have connected to.

    When you are done selecting tables, click Save.

    Datasets is displayed.

  6. You can change settings for the landing. This is not required.

    • Click Settings.

    For more information about settings, see Landing settings.

  7. You can now preview the structure and metadata of the selected data asset tables. This includes all explicitly listed tables, and tables that match the selection rules.

    If you want to add more tables from the data source, click Select source data.

  8. You can perform basic transformations on the datasets, such as filtering data, or adding columns. This is not required.

    For more information, see Managing datasets.

  9. When you have added the transformations that you want, you can validate the datasets by clicking Validate datasets. If the validation finds errors, fix the errors before proceeding.

    For more information, see Validating and adjusting the datasets.

  10. When you are ready, click Prepare to catalog the data task and prepare it for execution.

    You can follow the progress under Preparation progress in the lower part of the screen.

  11. When the data task is prepared, and you are ready to start replicating data, click Run.

The replication should now start, and you can see the progress in Monitor. For more information, see Monitoring an individual data task.

Selecting data from a database

You can select specific tables or views, or use selection rules to include or exclude groups of tables.

Information noteIf the selection includes views, CDC is not supported.

Use % as a wildcard to define a selection criteria for schemas and tables.

  • %.% defines all tables in all schemas.

  • Public.% defines all tables in the schema Public.

Selection criteria gives you a preview based on your selections.

You can now either:

  • Create a rule to include or exclude a group of tables based on the selection criteria.

    Click Add rule from selection criteria to create a rule, and select either Include or Exclude.

    You can see the rule under Selection rules.

  • Select one or more datasets, and click Add selected datasets.

    You can see the added datasets under Explicitly selected datasets.

Selection rules only apply to the current set of tables and views, not to tables and views that are added in the future.

Running a landing task with Change data capture (CDC)

You can run the landing task when it is prepared. This starts the replication which transfers data from the on-premises data source to the landing area.

  • Click Run to start landing data.

The replication should now start, and the data asset will have status Running. First, the full data source is copied, then changes are tracked. This means that changes are continuously tracked and transferred when discovered. This keeps the landing data in the landing area up to date.

In Qlik Cloud Data Integration home you can view status, date and time of when the landing data is updated, and the number of tables in error. You can also open the data asset and select the Tables tab to view basic metadata information for the tables.

You can monitor progress in detail by opening the Monitor tab. For more information, see Monitoring an individual data task.

When all tables are loaded and the first set of changes are processed, Data is updated to on the data asset card indicates that source changes up to that time are available in the data task.

Reloading tables

You can reload data from the source.

Reloading single tables

You can reload specific tables manually without interfering with change data capture. This is useful when there are CDC issues with one or more tables.

  1. Open the landing data task and select the Monitor tab.

  2. Select the tables that you want to reload.

  3. Click Reload tables.

Information noteThis option will become available after that landing task has run at least once. If clicked while the landing task is not running, the tables will be reloaded the next time the task runs.

If you cannot resolve the issues by reloading tables, or if they affect the entire task, you can reload all tables to the target instead. This will restart change data capture.

Reloading all tables to the target

You can reload all tables to the target if you experience CDC issues that cannot be resolved by reloading specific tables. Examples of issues are missing events, issues caused by source database reorganization, or failure when reading source database events.

Information noteThis operation is only available for tasks with the update method Change data capture (CDC), and that have run at least once.

  1. Stop the data task and all tasks that consume it.
  2. Open the data task and select the Monitor tab.

  3. Click ..., and then Reload target.

This will reload all tables to the target using Drop-Create, and will restart all change data capture from now.

  • Storage tasks that consume the landing data task will be reloaded via compare and apply at their next run to get in sync. Existing history will be kept. Type 2 history will be updated to reflect changes after the reload and compare process is executed.

    The timestamp for the from date in the type 2 history will reflect the reload date, and not necessarily the date the change occurred in the source.

  • Storage live views will not be reliable during the reload target operation, and until the storage is in sync. Storage will be fully synced when:

    • All tables are reloaded using compare and apply,

    • One cycle of changes is performed for each table.

Information noteMetadata changes are not supported. If there are metadata changes in the source they are propagated to landing when reloading data but they will not be handled properly. This may cause the consuming storage to fail.

Running a landing data task with Reload and compare

You can copy data using the landing data task when it is prepared.

  • Click Run to start the full load.

Data will now start being copied, and the data task will have status Running. When the full data source is copied, the status is Completed.

In Qlik Cloud Data Integration home you can view status, date and time of when the landing data is updated, and the number of tables in error. You can also open the data asset and select the Tables tab to view basic meta data information for the tables.

You can monitor progress in detail by opening the Monitor tab. For more information, see Monitoring an individual data task.

When all tables are loaded, Data is updated to on the data task card indicates that source changes up to that time are available in the data asset. However, some tables of the data task can be updated to a later time, depending on when they started loading. This means that data consistency is not guaranteed. For example, if the load started at 08:00 and took 4 hours, Data is updated to will show 08:00 when the load is completed. However, a table that started reloading at 11.30 will include source changes that occurred between 08:00 and 11:30.

Data is updated to reflects only tables that loaded successfully. It does not indicate anything regarding tables that their reloads have failed. In cloud targets, the field will be empty if a reload completed with all tables in error.

Reloading data when using Reload and compare

When you use Reload and compare as update method, you need to reload data to keep it up-to-date with the data source.

  • Click Reload to perform a manual reload of all tables.

  • Set up a scheduled reload.

Reloading single tables

You can reload specific tables manually. This is useful when there are issues with one or more tables.

  1. Open the landing data task and select the Monitor tab.

  2. Select the tables that you want to reload.

  3. Click Reload tables.

Information note
  • This option will be available after the landing task has run at least once, and only when the task is not running.
  • Metadata changes are not supported. If there are metadata changes in the source they are propagated to landing when reloading data but they will not be handled properly. This may cause the consuming storage to fail.

Scheduling a Reload and compare landing data task

You can schedule periodical reloads for the landing data task if you have the Can operate role in the space of the data task . Data task status must be at least Prepared for the schedule to be active.

  • Click ... on a data task and select Scheduling.

    You can set a time based schedule.

Information noteIf a data task is still reloading when a scheduled reload is about to start, the scheduled reload is skipped until the next scheduled reload event.

Setting load priority for datasets

You can control the load order of datasets in your data task by assigning a load priority to each dataset. This can be useful, for example, if you want to load smaller datasets before large datasets.

  1. Click Load priority.

  2. Select a load priority for each dataset.

    The default load priority is Normal. Datasets will be loaded in the following order of priority:

    • Highest

    • Higher

    • High

    • Normal

    • Low

    • Lower

    • Lowest

    Datasets with the same priority are loaded in no particular order.

  3. Click OK.

Information noteDatasets from SaaS application sources may contain dependencies in load order. Consider this when setting the load priority.

Operations on the landing data task

You can perform the following operations on a landing data task from the task menu.

  • Open

    This opens the landing data task. You can view the table structure and details about the data task.

  • Edit

    You can edit the name and the description of the task.

  • Delete

    You can delete the data task.

    The following objects are not deleted, and need to be deleted manually:

    • The data in the landing area.

  • Run

    You can run the data task to start copying data.

    Running a landing task with Change data capture (CDC)

    Running a landing data task with Reload and compare

  • Stop

    You can stop operation of a data task that is running. The landing area is not updated with changed data.

    When you stop a full load data task with a reload schedule, only the current reload is stopped. If the data task status is Stopped and there is an active reload schedule, it will reload again at the next scheduled time. You must turn off the reload schedule in Schedule reload.

  • Reload

    You can perform a manual reload of a data task in Reload and compare update mode.

  • Prepare

    This prepares a task for execution. This includes:

    • Validating that the design is valid.

    • Creating or altering the physical tables and views to match the design.

    • Generating the SQL code for the data task.

    • Creating or altering the catalog entries for the task output datasets.

    You can follow the progress under Preparation progress in the lower part of the screen.

  • Recreate tables

    This recreates the datasets from the source.

    You must also recreate all downstream data tasks that consume this data task.

  • Scheduling

    You can setup a scheduled reload for landing data tasks in Full load mode. You can set a time based schedule that can be customized.

    You can also turn on or off scheduled reloads.

    You must have the Can operate role on the space of the data task to schedule reloads.

  • Store data

    You can create a storage data task that uses data from this landing data task.

Removing columns

If you drop a column which is consumed by a storage data task with history enabled, you need to follow these steps to preserve history and avoid possible data loss.

  1. Stop the landing data task.

  2. Run the storage data task to ensure that all landing data is read.

  3. Drop the column in the landing.

  4. Run the landing data task.

  5. In storage, add the column with a default expression (Null or default value), or drop the column.

Maintenance of the landing area

Automatic cleanup of the landing area is not supported. This can affect performance.
We recommend that you perform manual cleanups of old full load data in the landing area.

  • Qlik Cloud (via Amazon S3)

    If there are several folders of full load data, you can delete all but the most recent folder. You can also delete change data partitions that have been processed.

  • Cloud data warehouse

    You can delete full load and change table records that have been processed.

Limitations

  • Replicating varchar data longer than 8000 bytes, or Nvarchar longer than 4000 bytes, is not supported.

Learn more

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!