A data pipeline is an end-to-end process to move data from source to target including any required transformations. A pipeline could be as simple as a straight mirroring of data from source to target, or as complex as a complete enterprise data warehouse solution including multiple data marts serving a diverse range of requirements.
The first step to create a data pipeline in Qlik Cloud Data Integration is to create a project. A project defines the source and onboarding of the data. This is sufficient for a complete pipeline for some use cases.
The initial focus of a data project is onboarding the data. This involves transferring the data continuously from the on-premise or cloud data source and generating datasets in read-optimized format. Onboarding involves two steps: landing and storing
Landing the data involves transferring the data continuously from the data source to a landing area, using a Landing data task.
Storing data involves generating datasets based on the landing data, using a Storage data task.
A data integration project is how we build, run, and monitor data pipelines.
|Concept||Relationship to project||Description|
|Data tasks||Component||Data tasks are a
fit-for-purposecollection of tables or files and an associated operation on those files. It is the main unit of work within a project in the data project. Examples of data tasks include transform and data mart.
|Data spaces||Dependency||Data spaces are governed areas of your Qlik Cloud tenant that are used to manage projects and their data assets. Access to a data space is determined by membership to the space. Access to projects and their data assets inside a data space is determined by roles assigned to members of the space. This means that a user must first be a member of the data space, and second, have the required roles to create, manage, or monitor data assets and resources in a data space. Members with the roles to consume data assets can also use data assets from a data space when building apps in personal, shared, and managed spaces.|
|Data Gateway||Dependency||Data Gateway is used by the landing data asset for associating a replication task with it, as well as for control and basic monitoring of this task.|
|Data connection||Dependency||A data connection is used by the storage data asset for connecting to AWS S3 buckets or cloud data warehouses, for the purpose of either reading from the staging area or writing to the customer-managed storage area.|
|Registered data||Component||Registered data is similar to a data task, however it does not perform and actions against the data directly. It is designed to expose data landed outside of Qlik Cloud to the data project, so it can then be used in the data pipeline.|
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!