Skip to main content Skip to complementary content

Controlling data storage tasks

Once the Storage Zone tables have been created and the task statements have been generated, you can then proceed to run the Storage Zone task. The Storage Zone task extracts data from the Landing Zone tables and loads it into the Storage Zone tables.

Limitations and considerations

  • When task are reloaded in Replicate November 2020, Compose will also reload the task (i.e. perform a Full Load) and only then apply the changes. Depending on the number of tables involved, this may takes some time as two reloads will be performed (one in Replicate and the other in Compose).
  • A storage directory may be used exclusively by only one Compose project.

  • Data storage tasks are optimized to run on relatively large batches of data. It is recommended to specify a partition length in excess of one hour. Although specifying a partition length of less than one hour may improve latency, creating many partitions on the target may also impact (target) performance (especially in systems with large volumes of changes).
  • Change Processing creates a new file on every write. This may cause many files to amass and degrade performance. Therefore, it is recommended to monitor the storage directory and periodically consolidate small files into larger ones and move/delete files that are no longer required.
  • Storage directories and subdirectories are managed by Compose; you should not delete files or write to them unless approved by Qlik Support or explicitly recommended in this guide.

  • When using a Hive-based compute platform, for optimal performance, it is recommended to allocate a dedicated queue to Compose tasks only.
  • When using a Hive-based compute platform, in order to see the delta of data changes in the storage tables, you need to define the following commands so that Hive can read the subdirectories:
    set hive.supports.subdirectories=true;
    set hive.input.dir.recursive=true;

Running a task

Storage Zone tasks can be run manually, scheduled to run periodically or run as part of a workflow. The section below describes how to run a Storage Zone task manually. For information on scheduling Storage Zone tasks or including them in a workflow, see Controlling and monitoring tasks and workflows .

Information note

If there is a Replicate source table with data, that:

  • Was not originally selected in the Replicate Full Load and Apply Changes task (i.e. was added later).

     

    -OR-

  • Was selected in a Replicate Full Load and Apply Changes task, but was not selected in the mappings of the Compose Full Load and Change Processing data storage tasks, and the tasks have already been run.

In any of the above scenarios, in order to get the data that was added later, you need to:

  1. Duplicate the Compose Full Load and Change Processing tasks associated with that table.
  2. Run the duplicated Full Load task.
  3. Run the duplicated Change Processing task.

Note that after running these tasks, duplicate records may exist in the Storage Zone, but they will be removed when reading from the Storage Zone views.

To run a Storage Zone task:

  1. Click the Manage button in the bottom left of the Storage Zone panel.

    The Manage Storage Tasks window opens.

  2. If multiple tasks have been defined, in the left pane, select the task that you want to run.
  3. Click the Run toolbar button. The window switches to Monitor view and the following status bars are displayed:

    • Completed - Shows the tables that have already been loaded into Hive
    • Loading - Shows the tables currently being loaded into Hive
    • Queued - Shows the tables waiting to be loaded into Hive
    • Error - Shows the tables that could not be loaded into Hive due to error. Click the Show Details link below the bar to see more information about the statement(s) that resulted in the error.
    • Canceled - The number of canceled tables (tables that were not processed due to the task being aborted) does not appear as a separate status bar. To view the number of canceled tables, click the Select All link above the status bars.
    To see more information about tables in a particular status, click the relevant status bar. A list of tables in the selected status will be shown.

    When the task status is indicated by a Icon showing task completed status icon, close the Manage Storage Tasks window.

    You can stop the task at any time by clicking the Abort toolbar button. This may be necessary if you need to urgently edit the task settings due to some unforeseen development. After editing the task settings, simply click the Run button again to restart the task.

    You can also access the task log files by clicking the View Log button.

    Information note

    Aborting a task may leave the Storage Zone tables in an inconsistent state. Consistency will be restored the next time the task is run.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!