An overview of the Pivotal Greenplum target
The Qlik Replicate database for Pivotal Greenplum is a powerful operational data warehousing solution that manages Big Data analytics and challenges. Qlik Replicate uses Pivotal Greenplum’s Scatter/Gather Streaming technology to help with data integration. This technology handles large amounts of data well.
The Qlik Replicate Pivotal Greenplum database makes it possible to load data from other heterogeneous data sources and maintain the most up to date information. This is done by capturing changes and streaming the changes to the Pivotal Greenplum data warehouse. This can be done with a very low impact on the source data.
The Qlik Replicate Pivotal Greenplum database provides full automation for:
- Schema generation and data type mapping
- Full load of source database tables
- Incremental load of changes made to source tables
- Application of DDL changes made to the source tables.
- Synchronization between full load and CDC processes.
Manual control is also available if needed.
The Qlik Replicate Pivotal Greenplum database integrates with the Pivotal Greenplum database in two ways:
- Pivotal Greenplum ODBC API. This is used for metadata management. The Pivotal Greenplum ODBC API lets Qlik Replicate test the database connection, get the table list and the table schema, build procedures that create external tables to process a file, and invoke the procedures that load the destination table or apply changes from the external table. During the schema generation, data types can be mapped, such as Pivotal Greenplum to Postgres. Primary keys and distribution clauses are generated based on the primary key.
-
Pivotal Greenplum Parallel File Distribution Server (gpfdist). This utility is used with read-only external tables for fast, parallel data loading into a Pivotal Greenplum data warehouse. gpfdist uses maximum parallelism while reading from external tables.
Qlik Replicate works closely with gpfdist to take advantage of its optimized fast, parallel loading facilities. Qlik Replicate uses the Pivotal Greenplum Parallel File Distribution Server to support both full load and incremental load activities.
See Qlik Replicate Pivotal Greenplum endpoint architecture overview for a description of the system architecture used with the Pivotal Greenplum database.