Adding data lake projects
Adding a new project is the first task you need to undertake in order to work with Qlik Compose.
There are two types of project:
- Data Lake - for ingesting data from multiple sources and moving it to a storage system for analytics.
- Data Warehouse - for ingesting data from multiple sources and creating analytics-ready data marts.
This topic guides you through the steps required to set up a Data Lake project. For instructions on setting up a Data Warehouse project, see Adding data warehouse projects.
You can set up as many projects as you need, although the ability to actually run tasks is determined by your Compose license.
To prevent unpredictable behavior, each project must be defined with a dedicated Storage Zone.
To add a new Data Lake project:
-
Click the New Project toolbar button.
The New Project wizard opens.
-
In the Project Name tab, specify the following and then click Next:
-
Name: The project name.
Warning noteProject names cannot contain the following characters: /\,&#%$@=^*+"'`~?<>:;[]{} as well as all non-printable characters (below 0x20). The project name can contain a single dot, but it cannot be the first or last character. - Environment Type: Optionally, change the default environment type.
- Environment Title: Optionally, specify an environment title.
For information about the environment settings, see Environment tab.
Warning noteThe following names are reserved system names and cannot be used as project names: >CON, PRN, AUX, CLOCK$, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8 and >LPT9.
-
- Select Data Lake as your project type.
-
Choose whether to create your Storage as an Operational Data Store (ODS)or as an Operational and Historical Data Store (ODS + HDS). Choose Operational Data Store to create an ODS from the source data or Operational and Historical Data Store to create an ODS from the source data while maintaining previous versions of updated records. Once you have made your choice, click Next.
-
In the Deployment tab, select where you want your Data Lake to be deployed. Then click Next.
Information noteYour choice will determine which storage system options are available in the Storage screen.
-
In the Storage tab, select a storage system. If you select File System, choose a file format. Then click Next.
Warning note- Renaming a column in Parquet or Avro format will cause the loss of all data in that column.
- Parquet and Avro formats do not allow spaces in Primary Key column names. If your project is set up to ingest tables from Replicate, you can define a global transformation in Replicate to remove spaces from Primary Key column names.
-
In the Compute tab, select a compute platform and then click Finish to exit the wizard.
Information noteBefore configuring connectivity, make sure to install the relevant driver for your compute platform. See Prerequisites for more information.
-
The project panels will be displayed.
- Add a Storage Zone (Data Lake) and at least one source database as described in Defining a Storage Zone and Defining Landing Zones respectively.
- Select the source tables as described in Selecting source tables and managing metadata.
- Create the storage tables as described in Creating and Managing Storage Zone Tasks .