Landing streaming data to Qlik Open Lakehouse
You can land data from a streaming source in Amazon S3, ready for the Streaming Transform task to convert it into the Iceberg open table format.
Landing streaming data to a Qlik Open Lakehouse requires a pre-configured Amazon S3 bucket. Qlik Open Lakehouse is specifically optimized for high-volume data sources, and compatible with all Qlik-supported streaming data sources. For more information on supported streaming sources, see Connecting to data streams.
Raw data lands in Avro format in S3 and the Streaming Transform task converts the data to Iceberg format. The Iceberg specification enables data to be queried from any engine that natively supports Trino SQL, for example Amazon Athena, Ahana, or Starburst Enterprise. Optionally, tables can be mirrored to your cloud data warehouse where they can be queried without duplicating data.
Preparations
-
Make sure that you have set up Qlik Open Lakehouse. This includes creating a network integration, a lakehouse cluster, and source and target connections. For more information, see Setting up Qlik Open Lakehouse.
-
To mirror data to your cloud data warehouse, you must first create a Qlik Open Lakehouse project to ingest your data and store it using the Iceberg open table format. You can add a Mirror data task after the Streaming Transformation task. For more information, see Mirroring data to a cloud data warehouse.
Creating a Streaming landing task
To create a Streaming landing task, do the following to first create the project:
-
Create a project, and select Data pipeline in Use case.
-
Select Qlik Open Lakehouse in Data platform and establish a connection to the data catalog.
-
Set up a storage area in Landing target connection.
-
Click Create to create the project.
When you onboard data or create a landing task in the project, a Streaming landing task is created instead of a Landing task. Streaming landing tasks operate and behave similar to a Landing task, except that they land data to cloud storage from streaming sources. For more information, see Connecting to data streams.
All files are landed in Avro format. After landing data is updated, the Streaming Transformation task consumes the landing data and updates the external tables.
Viewing task information
Click on the menu bar to view task information, such as:
-
Owner
-
Space
-
Data platform
-
Project ID
-
Data task runtime ID
Operations
The following operations are available in a streaming landing task:
-
Dropping a column
Select the column and click Remove.
This will add a transformation rule that removes the column from newly loaded data after the task is prepared and run. You can restore the column for new records by deleting the transformation rule.
-
Hashing a column, for example to mask sensitive information.
Select Hash in the column.
This will generate a SHA-256 hash of the input column after concatenating it with the Hash salt string. Hash salt string is a project setting, available in Qlik Open Lakehouse projects.
Data type is changed to String when a column is hashed. If you want to keep non-hashed data as well for privileged users, perform the hash later in a Transform task.
-
Filtering data
For more information, see Filtering a dataset.
-
Renaming a dataset
Click
on the dataset and select Rename.
Settings
For more information about task settings, see Streaming lake landing settings