Managing datasets
You can manage the datasets included in Landing, Storage, Transform, Data mart, and Replication data tasks to create transformations, filter the data, and add columns.
The included datasets are listed under Datasets in the Design view. You can select which columns to display with the column picker ().
Datasets in the Design view of a data task


Transformation rules and explicit transformations
You can perform both global and explicit transformations .
Transformation rules
You can perform global transformations by creating a transformation rule that uses % as a wild card in the scope to apply to all matching datasets.
-
Click Rules, and then Add rule to create a new transformation rule.
For more information, see Creating rules to transform datasets.
Transformation rules are indicated by a dark purple corner on the affected attribute.
Explicit transformations
Explicit transformations are created:
-
When you use Edit to change a column attribute
-
When you use Rename on a dataset.
-
When you add a column.
Explicit transformations override global transformations, and are indicated by a light purple corner on the affected attribute.
Filtering a dataset
You can filter data to create a subset of rows, if required.
-
Click Filter
For more information, see Filtering a dataset.
Renaming a dataset
You can rename a dataset.
-
Click
on a dataset, and then Rename.
Adding columns
You can add columns with row-level transformations, if required.
-
Click Add column
For more information, see Adding columns to a dataset.
Editing a column
You can edit column properties by selecting a column and clicking Edit.
-
Name
-
Key
Set a column to be a primary key. You can also set keys by selecting or deselecting in the Key column.
-
Nullable
-
Data type
Set the data type of the column. For some data types, you can set an additional property, for example, Length.
Removing columns
You can remove one or more columns from a dataset.
-
Select the columns to remove, and click Remove.
If you want to see removed columns, click Show removed columns. Removed columns are indicated with strike-through text. You can retrieve a removed column by selecting it, and clicking Revert.
Reverting explicit changes to columns
You can revert all explicit changes to one or more columns.
-
Select the columns to revert changes to, and click Revert.
Changes from global transformation rules will not be reverted.
If you revert an added column, it will be removed.
Dataset settings
You can change settings for the dataset. The default setting is to inherit the setting of the data asset, but you can also change a setting to be explicitly On or Off.
-
Click
on a dataset, and then Settings.
Validating and adjusting the datasets
You can validate all datasets that are included in the data task.
Expand Validate and adjust to see all validation errors and design changes.
Validating the datasets
-
Click Validate datasets to validate the datasets.
Validation includes checking that:
-
All tables have a primary key
-
There are no missing attributes.
-
There are no duplicate table or column names.
You will also get a list of design changes compared to the source:
-
Added tables and columns
-
Dropped tables and columns
-
Renamed tables and columns
-
Changed primary keys and data types
Expand Validate and adjust to see all validation errors and design changes.
-
Fix the validation errors, and then validate the data sets again.
-
Most design changes can be adjusted automatically, except changed primary keys or data types. In this case, you need to sync the datasets.
Preparing the datasets
You can prepare datasets to adjust design changes with no data loss if possible. If there are design changes that cannot be adjusted without data loss, you will get the option to recreate tables from source with data loss.
This requires stopping the task.
-
Click
, then Prepare.
When the datasets are prepared, validate the datasets before restarting the storage task.
Recreating datasets
You can recreate the datasets from the source. When you recreate a dataset, there will be data loss in the data asset. As long as you have the source data, you can reload it from the source.
This requires stopping the task.
-
Click
, then Recreate.
Limitations
-
In Google BigQuery, if you delete or rename a column, this will recreate the table and lead to data loss.