Replicating data

You can replicate data from supported data sources to any supported target.

The data replication process consists of the following operations:

Retrieving the data from the data source
Transforming the data (optional)
Loading the data to the target
Keeping the data up-to-date with real-time change capture

For instructions on how to set up a connection to your data source, see Connecting to data sources

For instructions on how to set up a connection to your target database, see Connecting to targets.

To set up a replication task:

Click the Add new button in the top right and then select Create data project from the drop-down menu.
In the New data project dialog, do the following:
1. Provide a Name for your project.
2. Select the Space in which you want the project to be created.
3. Optionally, provide a Description.
4. Select Replication as the Use case.
5. Optionally, clear the Open check box if you want to create an empty project without configuring any settings.
6. Click Create.
  
  One of the following will occur:
  - If the Open check box in the New data project dialog was selected (the default), the project will open.
  - If you cleared the Open check box in the New data project dialog, the project will be added to your list of projects. You can open the project later by selecting Open from the project's menu.
After the project opens, click Replicate data.

The Replicate data wizard opens.
In the General tab, specify a name and description for the replication task. Then click Next.
In the Select source connection tab, select a connection to the source data. You can optionally edit the connection settings by selecting Edit from the menu in the Actions column.

If you have not yet created a connection to your data source, you need to create one by clicking Create connection in the top right of the tab.

You can filter the list of connections using the filters on the left. Connections can be filtered according to source type, gateway, space, and owner. The All filters button above the connections list shows the number of current filters. You can use this button to close or open the Filters panel on the left. Currently active filters are also shown above the list of available data connections.

You can also sort the list by selecting Last modified, Last created, or Alphabetical from the drop-down list on the right. Click the arrow to the right of the list to change the sorting order.

After you have selected a data source connection, optionally click Test connection in the top right of the tab(recommended), and then click Next.
In the Select datasets tab, select tables and/or views to include in the replication task. You can also use wildcards and create selection rules as described in Selecting data from a database.
In the Select target connection tab, select the target from the list of available connections and then click Next. In terms of functionality, the tab is the same as the Select source connection tab described earlier.
In the Settings tab, optionally change the following settings and then click Next.

Replication settings:

Information noteWhen replicating from SaaS application sources, the Full load replication mode is enabled by default and cannot be disabled.
- Full load: Loads the data from the selected source tables to the target platform and creates the target tables if necessary. The full load occurs automatically when the task is started, but can also be performed manually should the need arise.
- Apply changes: Keeps the target tables up-to-date with any changes made to the source tables.
- Store changes: Stores the changes to the source tables in Change Tables (one per source table).
  
  For more information, see Store changes.
Information noteALTER TABLE DDL operations are not currently supported. Other DDL operations such as DROP TABLE and TRUNCATE TABLE are supported.

Apply changes mode

Information noteWhen replicating to data warehouse targets, you cannot select which Apply changes mode to use. Changes will always be applied in Batch optimized mode for maximum efficiency.

Changes are applied to the target tables using one of the following methods:
- Batch optimized: This is the default. When this option is selected, changes are applied in batches. A preprocessing action occurs to group the transactions into batches in the most efficient way.
- Transactional: Select this option to apply each transaction individually, in the order it is committed. In this case, strict referential integrity is ensured for all tables.
Connection to staging area

When replicating to the data warehouses listed below, you need to set a staging area. Data is processed and prepared in the staging area before being transferred to the warehouse.
Either select an existing staging area or click Create new to define a new staging area and follow the instructions in Connecting to cloud storage.

To edit the connection settings, click Edit. To test the connection (recommended), click Test connection.

For information on which staging areas are supported with which data warehouses, see the Supported as a staging area column inTarget platform use cases and supported versions.
In the Summary tab, a visual of the data pipeline is displayed. Choose whether to Open the <name> task or Do nothing, and then click Create.

Depending on your choice, either the task will be opened or a list of projects will be displayed.
If you chose to open the task, the Datasets tab will show the structure and metadata of the selected data asset tables. This includes all explicitly listed tables as well as tables that match the selection rules.

If you want to add more tables from the data source, click Select source data.
Optional, change the task setting as described in Data replication task settings.
You can perform transformations on the datasets, filter data, or add columns.

For more information, see Managing datasets.
When you have added the transformations that you want, you can validate the datasets by clicking Validate datasets. If the validation fails, resolve the errors before proceeding.

For more information, see Validating and adjusting the datasets.
When you are ready, click Prepare to catalog the data task and prepare it for execution.
When the data task has been prepared, click Run.

For information on recovering tasks and other methods of running tasks, see Advanced run options.
The replication task should now start, and you can see the progress in Monitor. For more information, see Monitoring an individual data task

Setting load priority for datasets

You can control the load order of datasets in your data task by assigning a load priority to each dataset. This can be useful, for example, if you want to load smaller datasets before large datasets.

Click Load priority.
Select a load priority for each dataset.

The default load priority is Normal. Datasets will be loaded in the following order of priority:
- Highest
- Higher
- High
- Normal
- Low
- Lower
- Lowest
Datasets with the same priority are loaded in no particular order.
Click OK.

Datasets from SaaS application sources may contain dependencies in load order. Consider this when setting the load priority.

Reloading data

You can perform a manual reload of tables. This is useful when there are issues with one or more tables.

Select the tables to reload, and click Reload.

You can cancel the reload for tables that are pending reload by clicking Cancel reload. This will not affect tables that are already reloaded.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here