Skip to main content Skip to complementary content

Storing datasets

You can store datasets using a storage data task. The storage data task consumes the data that was landed to the cloud landing area by a landing data task. You can use the tables in an analytics app for example.

  • You can design a storage data task when the status of the landing data task is at least Ready to prepare.

  • You can prepare a storage data task when the status of the landing data task is at least Ready to run.

The storage data task will use the same mode of operation (Full load or Full load & CDC) as the consumed landing data task. Configuration properties are different between the two modes of operation, as well as monitor and control options. If you use a cloud target landing data task with full load only, the storage data task will create views to the landing tables instead of generating physical tables.

Information noteData tasks operate in the context of their owner. For more information about required roles and permissions, see Data space roles and permissions.

Creating a storage data task

You can create a storage data task in three ways:

  • Click ... on a landing data task and select Store data to create a storage data task based on this landing data asset.

  • Click Add new and then Store data. In this case you will need to specify which landing data task to use.

  • When you onboard data, a storage data task is created. It is connected to the landing data task also created when onboarding data.

    For more information, see Onboarding data.

When you have created the storage data task:

  1. Open the storage data task by clicking ... and selecting Open.
    The storage data task is opened and you can preview the output datasets based on the tables from the landing data asset.

  2. Make all required changes to the included datasets, such as transformations, filtering data, or adding columns.

    For more information, see Managing datasets.

  3. When you have added the transformations that you want, you can validate the datasets by clicking Validate datasets. If the validation finds errors, fix the errors before proceeding.

    For more information, see Validating and adjusting the datasets.

  4. Create a data model

    Click Model to set the relationships between the included datasets.

    For more information, see Creating a data model.

  5. Click Prepare to prepare the data task and all required artifacts. This can take a little while.

    You can follow the progress under Preparation progress in the lower part of the screen.

  6. When the status displays Ready to run, you can run the data task.

    Click Run.

    The data task will now start creating datasets to store the data.

Keeping historical data

You can keep type 2 historical change data to let you easily recreate data as it looked at a specific point in time. This creates a full historical data store (HDS).

  • Type 2 slowly changing dimensions are supported.

  • When a changed record is merged, it creates a new record to store the changed data and leaves the old record intact.

  • New HDS records are automatically time-stamped, to let you create trend analysis and other time-oriented analytic data marts.

You can enable historical data by clicking:

  • Replication with both current data and history of previous data in Settings when you onboard data.

  • Keep historical change records and change record archive in the Settings dialog of a storage task.

Information noteHistorical data is not available when using Qlik Cloud as data platform.

HDS data is stored in the Prior table in the internal data schema. You can use the history views and live history views in the external data schema to view historical data.

  • The history view merges data from the Current table and the Prior table. This view includes all changes that are merged.

  • The live history view merges data from the Current table, the Prior table, and the Changes table. This view also includes all changes that are not yet merged.

For more information, see Dataset architecture in a cloud data warehouse.

Scheduling a storage task

You can schedule a storage task to be updated periodically.

  • If the input landing data task is using Full load and CDC, you can only set a time based schedule.

  • If the input landing data task is using Full load, you can either set a time based schedule, or set the task to run when the input landing data task has completed running.

    Information noteWhen you run a time based schedule with an input landing data task using Full load, consider that every completed table in landing is available while the landing task is still running. This allows you to run landing and storage concurrently, which can improve the total load time.

Click ... on a data task and select Scheduling to create a schedule. The default scheduling setting is inherited from the settings in the data project. For more information about default settings, see Storage default values. You always need to set Scheduling to On to enable the schedule.

Time based schedules

You can use a time based schedule to run the storage data task regardless of the type of landing..

  • Select At specific time in Run the data task.

You can set an hourly, daily, weekly or monthly schedule.

Event based schedules

  • Select On successful completion of any input data task in Run the data task.

The storage task will run every time the input landing data task has completed successfully.

Information noteThis option is not available when the input landing data task is using Full load and CDC as the landing runs continuously.

Monitoring a storage task

You can monitor the status and progress of a storage task by clicking on Monitor.

For more information, see Monitoring an individual data task.

Troubleshooting a storage data task

When there are issues with one or more tables in a storage data task, you may need to reload or recreate the data. There are a few options available to perform this. Consider which option to use in the following order:

  1. You can reload the dataset in landing. Reloading the dataset in landing will trigger the compare process in storage and correct data while retaining type 2 history. This option should also be considered when:

    • The full load was performed a long time ago, and there is a large number of changes.

    • If full load and change table records that have been processed have been deleted as part of maintenance of the landing area.

    Landing data from data sources

  2. You can reload data in the storage data task.

    If historical data is enabled, a reload in storage may cause a loss of historical data. If this is an issue, consider reloading the landing from source instead.

    Reloading data

  3. You can recreate tables. This recreates the datasets from the source.

    This should be considered as the last option, as you must also recreate all downstream data tasks that consume this data task.

    • Click ..., and then click Recreate tables.

Reloading data

You can perform a manual reload of tables. This is useful when there are issues with one or more tables.

  1. Open the data task and select the Monitor tab.

  2. Select the tables that you want to reload.

  3. Click Reload tables.

The reload will happen the next time the task is run, and is performed by:

  1. Truncating the tables.

  2. Loading the landing data to the tables.

  3. Loading the changes accumulated from the reload time.

Downstream tasks will be reloaded to apply changes, and to avoid backdating. This is performed by:

  1. Comparing with the full load and applying the changes.

  2. Applying changes from after the reload.

You can cancel the reload for tables that are pending reload by clicking Cancel reload. This will not affect tables that are already reloaded, and reloads that are currently running will be completed.

There are some cases when it is recommended to reload the dataset in landing instead:

  • If historical data is enabled, a reload in storage may cause a loss of historical data. Reloading the dataset in landing will trigger the compare process in storage and correct data retaining type 2 history.

  • When the full load was performed a long time ago, and there is a large number of changes.

  • If full load and change table records that have been processed have been deleted as part of maintenance of the landing area.

Information noteReloading tables is not supported in data projects with Qlik Cloud as target data platform.

Storage settings

You can set properties for the storage data task when the data platform is a cloud data warehouse. If you use Qlik Cloud as data platform, see Storage settings for data projects with Qlik Cloud as data platform.

  • Click Settings.

Warning noteIf the task has already been run, changing a setting other than Runtime settings require that you recreate the datasets.

General settings

  • Database

    Database to use in the data source.

  • Data task schema

    You can change the name of the storage data task schema. Default name is the name of the storage task.

  • Internal schema

    You can change the name of the internal storage data asset schema. Default name is the name of the storage task with _internal appended.

  • Prefix for all tables and views

    You can set a prefix for all tables and views created with this task.

    Information noteYou must use a unique prefix when you want to use a database schema in several data tasks.
  • History

    You can keep historical change data to let you easily recreate data as it looked at a specific point in time. You can use history views and live history views to see historical data. Select Keep historical records and archive of change records to enable historical change data.

  • When comparing storage with landing, you can choose how to manage records that do not exist in the landing.

    • Mark as deleted

      This will perform a soft delete of records that do not exist in the landing.

    • Keep

      This will keep all records that do not exist in the landing.

    Information noteDatasets in Storage data task must have a primary key set. If not, each time landing data is reloaded an initial load will be performed on the Storage data task.

Views settings

  • Live views

    Use live views to read the tables with the least latency.

    For more information about live views, see Using live views.

    Information noteLive views are less efficient than standard views, and require more resources as the applied data needs to be recalculated.

Runtime settings

  • Parallel execution

    You can set the maximum number of data connections for full loads to a number from 1 to 5.

  • Warehouse

    The name of the cloud data warehouse. This setting is only applicable for Snowflake.

Storage settings for data projects with Qlik Cloud as data platform

You can set which folder to use in the storage when the data platform is Qlik Cloud as data platform.

  1. Click Settings.

  2. Select which folder to use in the storage.

  3. Click OK when you are ready.

Operations on the storage data task

You can perform the following operations on a storage data task from the task menu.

  • Open

    This opens the storage data task . You can view the table structure and details about the data task and monitor the status for the full load and batches of changes.

  • Edit

    You can edit the name and the description of the task, and add tags.

  • Delete

    You can delete the data task.

  • Prepare

    This prepares a task for execution. This includes:

    • Validating that the design is valid.

    • Creating or altering the physical tables and views to match the design.

    • Generating the SQL code for the data task

    • Creating or altering the catalog entries for the task output datasets.

    You can follow the progress under Preparation progress in the lower part of the screen.

  • Validate datasets

    This validates all datasets that are included in the data task.

    Expand Validate and adjust to see all validation errors and design changes.

  • Recreate tables

    This recreates the datasets from the source.

    You must also recreate all downstream data tasks that consume this data task.

  • Stop

    You can stop operation of the data task. The data task will not continue to update the tables.

    Information noteThis option is available when the data task is running.
  • Resume

    You can resume the operation of a data task from the point that it was stopped.

    Information noteThis option is available when the data task is stopped.
  • Transform data

    Create reusable row-level transformations based on rules and custom SQL. This creates a Transform data task.

    Transforming data

  • Create data mart

    Create a data mart to leverage your data tasks. This creates a Data mart data task.

    Creating and managing data marts

Limitations

  • If the data task contains datasets and you change any parameters in the connection, for example username, database, or schema, the assumption is that the data exists in the new location. If this is not the case, you can either:

    • Move the data in the source to the new location.

    • Create a new data task with the same settings.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!