Ingest from an external source regardless of where it has been sourced (RDBMS, HDFS or a cloud storage service such as Amazon S3, Azure ADLS and Azure WASB, Hive, local server, etc.) leverages two key steps required to onboard data into Qlik Catalog:
- Defining the source/entity (metadata)
- Ingest (data)
Once sources have been defined and metadata is in place, data can be loaded from the source. To load data into an entity, navigate to the entity, highlight the row and select Load from the More dropdown.
A data load modal displays for the user to assign editable date and timestamp fields to the load--this marker is important because it becomes the data load partition id. If it is not changed the default id is the timestamp at time of load (down to the second). Select OK.Radio buttons provide choice of load types: New (default), Append, and Overwrite.
This will initiate the dataload. Depending on the amount of the data ingest and the speed of the connection to the source system, this may take several minutes. If the data is several gigabytes or larger the load may take significantly longer. The status of the load is shown in (Job Status), a RUNNING status will appear until the load has FINISHED. To refresh logs and monitor the load status select the load row and select Reload Logs from the Bulk Action dropdown. When users first arrive at the load screen and data is loading for the first time, click on Refresh button to initiate the load.
INITIALIZED loads: When jobs are queued but have yet to start running they are in an INITIALIZED state. In the context of QVD loads where QVD entities are initialized but have not yet started loading, users may see this status lingering for longer than is typical for non-QVD entities. A maximum of five QVD data loads in INITIALIZED state at a time are allowed. Note behavior for INITIALIZED loads when Tomcat is restarted: Loads in INITIALIZED state when Tomcat is started will remain in the INITIALIZED state and not convert to RUNNING state after a restart but will FAIL after a mandatory two hour waiting interval. In contrast, jobs that are in a RUNNING state when Tomcat restarts are killed and FAIL immediately.
Refresh Load Logs to refresh the load status or set an Auto Refresh interval from the dropdown options:
- No Auto Refresh [default]
- 15s
- 30s
- 60s
Upon completion, Job Status will show as FINISHED or FAILED and show results of the load:
Completion status will show as FINISHED or FAILED and the log will show results of the action. When Job Status is FINISHED the records provide totals of Good Record Count, Bad Record Count, Ugly Record Count, and Filtered Record Count.
Select round expand or collapse icons to the right of the record counts to view messages with information about why records may have excepted as Bad or Ugly.
When Job Status is FAILED, select View Properties from the action dropdown to open Data Load Information.
Load Log contains details regarding why the load failed.