Creating rules to transform datasets
You can create re-usable transformation rules to perform global transformation on datasets in Landing, Storage, Transform, and Data mart data tasks.
Select Datasets, Click Transformation rules, and then Add transformation rule to create a new transformation rule.
-
Select the transformation type and click Next.
You can perform the following transformations:
-
Rename datasets
-
Rename columns
-
Add columns
-
Drop columns
-
Convert data types
-
Replace column values
-
-
Select the transformation scope and click Next.
-
Select which transformation action to perform and click Next.
-
Add a name and a description for the rule and click Finish.
The rule will be applied when you run the data task. If you add more than one rule, the rules are executed in the same order they are listed.
Transformation scope
When several transformations apply to the same dataset or columns, the scope is always on the original source names, while the action relates to the value after the previous rule was applied. For example, if you have these rules:
-
Rename all datasets that start with Abc_ (Abc_%) to change prefix to ABC_.
-
Add suffix _zzz to datasets that start with ABC_ (ABC_%).
Applying the rules to these datasets would provide these results. Note that after the second rule, the _zzz suffix is not added to ABC_customers as the scope is always on the original name (Abc_customers).
Original dataset name | Dataset name after rule 1 | Dataset name after rule 2 |
---|---|---|
Abc_customers |
ABC_customers |
ABC_customers |
ABC_Suppliers |
ABC_Suppliers |
ABC_Suppliers_zzz |
Renaming datasets
-
Select Rename dataset, and then click Next.
-
Set the transformation scope, that is, which datasets to rename. You can use % as a wild card character to select several datasets from one or more data assets.
Click Next.
-
Set the transformation action. You can perform the following actions:
-
Rename dataset to a fixed name.
-
Add a prefix or suffix.
-
Remove a prefix or suffix.
-
Replace a prefix or suffix.
-
Change case to lowercase or uppercase.
-
Replace dataset names using an expression.
-
Replace dataset names using a dictionary.
For more information, see Renaming datasets or columns using a dictionary.
Click Next when you are ready.
-
-
Set the name of the rule, and click Finish.
Renaming columns
-
Select Rename column, and then click Next.
-
Set the transformation scope, that is, which columns to rename. You can use % as a wild card character to select several columns from one or more data tasks and datasets.
You can also limit the action to a certain data type. Set data type to UNSPECIFIED to perform the action for all matching columns, regardless of data type.
Click Next.
You can also limit the scope to columns that are keys or nullable.
-
Set the transformation action. You can perform the following actions:
-
Rename column to a fixed name.
-
Add a prefix or suffix.
-
Remove a prefix or suffix.
-
Replace a prefix or suffix.
-
Change column case to lowercase or uppercase.
-
Replace column names using an expression.
-
Replace column names using a dictionary.
For more information, see Renaming datasets or columns using a dictionary.
Click Next when you are ready.
-
-
Set the name of the rule, and click Finish.
Adding columns
-
Select Add column, and then click Next.
-
Set the transformation scope, that is, which dataset to add the column. You can use % as a wild card character to add the column to one or more datasets.
Click Next.
-
Set the details of the new column:
-
Column name.
-
Use the expression builder to set Value for the column.
-
Select Add to primary key to use this column as a primary key.
-
Set the data type in Target data type.
-
If data type is BYTES, STRING, or WSTRING, specify a Length as well.
If data type is NUMERIC, specify Precision and Scale as well.
Click Next when you are ready.
-
-
Set the name of the rule, and click Finish.
Dropping columns
-
Select Drop column, and then click Next.
-
Set the transformation scope, that is, which columns to drop. You can use % as a wild card character to select several columns from one or more data assets and datasets.
You can also limit the action to a certain data type. Set data type to UNSPECIFIED to perform the action for all matching columns, regardless of data type.
Click Next.
You can also limit the scope to columns that are keys or nullable.
-
Set the name of the rule, and click Finish.
Converting data types
-
Select Convert data type, and then click Next.
-
Set the transformation scope, that is, which columns to convert data type. You can use % as a wild card character to select several columns from one or more data assets and datasets.
You can also limit the action to a certain data type. Set data type to UNSPECIFIED to perform the action for all matching columns, regardless of data type.
Click Next.
You can also limit the scope to columns that are keys or nullable.
-
Set the transformation action.
-
Set Target data type to the data type to convert into.
-
If data type is BYTES, STRING, or WSTRING, specify a Length as well.
If data type is NUMERIC, specify Precision and Scale as well.
Click Next when you are ready.
-
-
Set the name of the rule, and click Finish.
See also: Understanding the impact of changing a data type
Replacing column values
-
Select Replace column values, and then click Next.
-
Set the transformation scope, that is, which columns to replace values for. You can use % as a wild card character to select several columns from one or more data assets and datasets.
You can also limit the action to a certain data type. Set data type to UNSPECIFIED to perform the action for all matching columns, regardless of data type.
Click Next.
You can also limit the scope to columns that are keys or nullable.
-
Set the transformation action.
-
Set an expression to define the replacement values in Target value.
-
Set Target data type to the data type for the replaced columns.
-
If data type is BYTES, STRING, or WSTRING, specify a Length as well.
If data type is NUMERIC, specify Precision and Scale as well.
Click Next when you are ready.
-
-
Set the name of the rule, and click Finish.
Renaming datasets or columns using a dictionary
You can use a dictionary to rename datasets or columns. This is useful when you are working with a large number of objects that use cryptic naming conventions. You can add translations in a dictionary editor or import a CSV file containing the translations. You can also export the dictionary to a CSV file and reuse it in another data asset.
Rename by dictionary is available in Transformation actions when you have selected Rename datasets or Rename columns. Click to open the dictionary editor.
-
Click Add translation and fill in Source name and Translated name to add a translation to the dictionary.
When you have created the translations you need, click OK to close the dictionary editor.
Importing translations
You can import translations from a CSV file containing a dictionary. The file should contain one translation per row. Example:
-
Click Append from CSV and select the CSV file to import.
The translations in the imported dictionary are appended to your dictionary.
Exporting translations
You can export your dictionary to a CSV file to be able to reuse it in other data assets.
-
Click Export to CSV.
The dictionary is exported to a CSV file.
Using variables in an expression
The Metadata tab in the expression editor contains the following variables that you can use in an expression in a transformation rule.
Variable | Description |
---|---|
$Q_D_COLUMN_DATA |
The column value in the source table. |
$Q_M_COLUMN_NAME |
The modified column name. |
$Q_M_DATATYPE_LENGTH |
The modified data type length of a column. |
$Q_M_DATATYPE_NAME |
The modified data type of a column. |
$Q_M_DATATYPE_PRECISION |
The modified data type precision of a column. |
$Q_M_DATATYPE_SCALE |
The modified data type scale of a column. |
$Q_M_SCHEMA_NAME |
The name of the source schema. |
$Q_M_SOURCE_COLUMN_NAME |
The name of a column in the source table. |
$Q_M_SOURCE_DATATYPE_LENGTH |
The data type length of a column in the source table. |
$Q_M_SOURCE_DATATYPE_NAME | The data type of a column in the source table. |
$Q_M_SOURCE_DATATYPE_PRECISION |
The data type precision of a column in the source table. |
$Q_M_SOURCE_DATATYPE_SCALE |
The data type scale of a column in the source table. |
$Q_M_SOURCE_TABLE_NAME |
The name of the source table. |
$Q_M_TABLE_NAME |
The modified name of the source table. |
Example:
You can rename all tables that are in the transformation scope to add the schema name as prefix with the following expression:
${Q_M_SCHEMA_NAME}||'.'||${Q_M_SOURCE_TABLE_NAME}.
This would for example rename the table products in the dwprod schema to dwprod.products.
Managing rules
You can manage rules in the Rules pane.
-
Click Rules.
You can enable/disable a rule, delete a rule, and edit a rule.