Landing streaming data to Qlik Open Lakehouse

You can land data from a streaming source in Amazon S3, ready for the Streaming Transform task to convert it into the Iceberg open table format.

Landing streaming data to a Qlik Open Lakehouse requires a pre-configured Amazon S3 bucket. Qlik Open Lakehouse is specifically optimized for high-volume data sources, and compatible with all Qlik-supported streaming data sources. For more information on supported streaming sources, see Connecting to data streams.

Raw data lands in Avro format in S3 and the Streaming Transform task converts the data to Iceberg format. The Iceberg specification enables data to be queried from any engine that natively supports Trino SQL, for example Amazon Athena, Ahana, or Starburst Enterprise. Optionally, tables can be mirrored to your cloud data warehouse where they can be queried without duplicating data.

Preparations

Make sure that you have set up Qlik Open Lakehouse. This includes creating a network integration, a lakehouse cluster, and source and target connections. For more information, see Setting up Qlik Open Lakehouse.
To mirror data to your cloud data warehouse, you must first create a Qlik Open Lakehouse project to ingest your data and store it using the Iceberg open table format. You can add a Mirror data task after the Streaming Transformation task. For more information, see Mirroring data to a cloud data warehouse.

Creating a Streaming landing task

To create a Streaming landing task, do the following to first create the project:

Create a project, and select Data pipeline in Use case.
Select Qlik Open Lakehouse in Data platform and establish a connection to the data catalog.
Set up a storage area in Landing target connection.
Click Create to create the project.

When you onboard data or create a landing task in the project, a Streaming landing task is created instead of a Landing task. Streaming landing tasks operate and behave similar to a Landing task, except that they land data to cloud storage from streaming sources. For more information, see Connecting to data streams.

All files are landed in Avro format. After landing data is updated, the Streaming Transformation task consumes the landing data and updates the external tables.

Viewing task information

Click on the menu bar to view task information, such as:

Owner
Space
Data platform
Project ID
Data task runtime ID

Operations

The following operations are available in a streaming landing task:

Dropping a column

Select the column and click Remove.

This will add a transformation rule that removes the column from newly loaded data after the task is prepared and run. You can restore the column for new records by deleting the transformation rule.
Hashing a column, for example to mask sensitive information.

Select Hash in the column.

This will generate a SHA-256 hash of the input column after concatenating it with a hash salt string. You set the hash salt string in the project settings under Metadata > Hash. This setting is only available in Qlik Open Lakehouse projects. For more information, see Metadata

Data type is changed to String when a column is hashed. If you want to keep non-hashed data as well for privileged users, perform the hash later in a Transform task.
Filtering data

For more information, see Filtering a dataset.
Renaming a dataset

Click on the dataset and select Rename.

Deleting a task

You can delete the data task if it is not running, and there are no dependencies to downstream tasks in the same project.

In Pipeline project view of the project, click on a task and select Delete.

Artifacts (tables and views) created by the task will also be deleted, unless you select to keep them.

Keep in mind that the artifacts you keep will no longer be updated by the task.

Settings

For more information about task settings, see Streaming lake landing settings

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here