Data pipeline
A data pipeline is an end-to-end process to move data from source to target, including any required transformations. A pipeline can be as simple as a straight mirroring of data from source to target, or as complex as a complete enterprise data warehouse solution including multiple data marts serving a diverse range of requirements.
A data pipeline takes your source data, often from multiple operational systems, and moves the data to where you need it. It transforms, merges, and validates the data into a consumption-ready format.
Data pipelines, ETL & other approaches
A data pipeline is a high-level concept. The actual implementation of the data pipeline will be based on one or more of the following techniques:
ETL: Extract, Transform & Load. ETL is one of the oldest techniques in data movement and has been widely used since the 1970s. Data is extracted from the source system, then processing and transformation occurs within the ETL tool before the transformed data is loaded into the target system. Talend Studio is an example of a popular tool that supports developing ETL pipelines.
ELT: Extract, Load & Transform: ELT is an alternative technique to ETL, where the data is loaded into the destination system before being transformed. SQL pushed down from the ELT tool performs the actual transformations, although this is usually transparent to the user and is generated by the ELT tool. ELT is generally seen as performing better due to this, however it is limited by the capabilities of the destination system. Qlik Talend Data Integration is largely based on ELT, however where necessary can use some ETL techniques.
Event Streams and Message Queues: Both variants of the message broker architecture, these techniques deal with data that is in motion rather than read from in a static data source. Operations on these technologies cannot rely on having access to the full data set and often will process a subset rather than the full data set. These technologies are often used in combination with traditional databases and will be integrated into a ETL/ELT based solution. Talend Studio is an example of a tool that supports transformations against these technologies.
Qlik understands that there is not a single type of data pipeline that meets every need and our focus in developing Qlik Talend Cloud is to provide solutions that meet our customers needs.
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!