Creating a Qlik Open Lakehouse project
Create a Qlik Open Lakehouse pipeline project to ingest data from any source. Store your data in Iceberg open table format.
Prerequisites
To create a Qlik Open Lakehouse project, you need:
-
A network integration to enable Qlik to provision and manage compute resources on your behalf.
-
A lakehouse cluster configured to run the data storage task within your Iceberg project.
-
A connection to a data catalog to use as the data target for your project, or the necessary details so you can create a new connection.
Supported tasks
The following tasks are supported in a Qlik Open Lakehouse project.
CDC and SaaS application sources
-
Lake landing data task
Land data in CSV format in S3, from any Qlik-supported source, including high-volume data streams.
For more information, see Landing data to Qlik Open Lakehouse.
-
Storage data task
The Storage data task consumes data landed in the cloud by the Lake landing task. The task write data into Iceberg tables for efficient storage and querying.
-
For more information, see Storing datasets.
-
Mirror data task
Mirror Iceberg tables from your storage task to your cloud data warehouse. Users can query data via external tables without migrating data to your cloud data warehouse.
Streaming sources
-
Streaming landing data task
Land data in Avro format in S3, from any Qlik-supported streaming source.
For more information, see Landing streaming data to Qlik Open Lakehouse
-
Streaming Transform data task
The Streaming Transform data task consumes the events landed in the cloud by the Streaming landing task. The task write data into Iceberg tables for efficient storage and querying. and supports transformations
For more information, see Storing streaming datasets.
-
Mirror data task
Mirror Iceberg tables from your streaming storage task to your cloud data warehouse. Users can query data via external tables without migrating data to your cloud data warehouse.
Example of creating a Qlik Open Lakehouse project
The following example creates a Qlik Open Lakehouse pipeline project, onboards data from a CDC source, and stores it in Iceberg format tables. This example creates a simple pipeline that you could expand by onboarding more data sources. You could add a Mirror data task to mirror your tables in your data warehouse without duplicating data, or use this project as the source for a project that requires transformations in your cloud data warehouse.
To create a Qlik Open Lakehouse project, do the following:
-
In Data Integration home, click Create pipeline, and configure it:
-
Name: Enter the name for the project.
-
Space: Select the space the project will belong to.
-
Description: Optionally, enter a description for the project.
-
For Use case, select Data pipeline.
-
Configure the Data platform:
-
Data platform: Select Qlik Open Lakehouse from the list.
-
Data catalog connection: In the list, select an existing connection or click Create new to add a new data catalog connection.
-
Landing target connection: Select the S3 bucket for landing the data or click Create new to add a new bucket location.
-
Storage compute cluster: Select the lakehouse cluster that will run the storage task.
-
Create the project.
-
Follow the steps in the onboarding data wizard. For more information, see Onboarding data, which provides instructions for CDC and streaming sources.