Adding SQL transformations
You can include SQL-based tables in transformation tasks. A SQL transformation allows you to input a SQL SELECT query into a pipeline to define complex or simple transformations. You can let SQL assistant generate a query from a text prompt using generative AI. You can also use macros to adapt the query to run if an initial load or an incremental load is performed.
For more information about creating a transformation task, see Transforming data.
Adding a SQL transformation
To add a SQL transformation in a Transform data task:
-
In Transform, select the datasets to include in the query, and click Add SQL transformation.
Set the name of the transformation in Name. For more information about the other settings, see Settings.
Click Add when you are ready to create the SQL transformation.
Edit SQL transformation is displayed.
-
Write the query to return the SQL-based table in the query pane.
For more information, see Editing the SQL query.
-
Click Run query.
This will fetch metadata and a preview of the data.
-
Review the parameters in Parameters.
For more information, see Parameters.
-
Set a primary key in Metadata.
For more information, see Metadata.
-
When you are satisfied with the results of the query, click Save and close. The dataset is now added to the list of targets, and you can preview it.
Editing the SQL query
Type the query to return the SQL-based table in the query pane. The query can only read from the source datasets that you selected, but you can add more source datasets with Add.
-
You can create a statement to load a dataset from Objects. Click
and then Generate select statement in editor.
-
You can copy the name of a dataset to the clipboard by clicking
and then Copy.
-
Click > to move a dataset or column name to the editor.
-
Click
to search and replace text in the editor.
If incremental load is used, you can use Macros to adapt the query to run if an initial load or an incremental load is performed.
For more information, see Using macros to adapt the query to load type.
Keyboard shortcuts
-
Ctrl+F Search and replace
-
Ctrl+/ Comment/uncomment the current line
-
Ctrl+Enter Run the query
-
Ctrl+Z Undo
-
Ctrl+Y Redo
Previewing results
You can preview the results using a sample of the data in Results.
You must be assigned the following roles to be able to preview data:
-
Can view data in the space where the connection resides.
-
Can view in the space where the project resides.
Results using a sample of the data are displayed. You can set how many data rows to include in the sample with Number of rows.
-
Click Last executed query to see the last run time and duration of the last executed query.
Filtering the results
You can filter the results by one or more columns.
-
Click
on the first column to filter by and then Filter.
Select which operator to use and which value to filter by.
You can add more column filters by clicking Add filter.
The filter will only affect the existing sample data.
Sorting the results
You can sort the data sample by a specific column.
-
Click
on the column to sort by and then Sort ascending or Sort descending.
Sorting will only affect the existing sample data. If you used a filter to only include orders from 2024 and invert the sort order, the sample data will still only contain orders from 2024.
Managing displayed columns
You can select which columns to display in the results. The selection is not saved when you save the transformation.
-
Hide a column by clicking
on the column and then Hide column.
-
Manage display of all columns by clicking
on any column and then Displayed columns.
Parameters
Click Parameters to change the parameters in the SELECT query. The parameters should be mapped to the source or target objects.
Parameters are automatically mapped when you:
-
Use Generate select statement in editor in Objects.
-
Click > to move a dataset or column name to the editor.
-
A parameter name matches a table name from the tables in Objects.
You can map to
-
History type 2 structures (_history)
-
Current structures
-
Target objects
You can only map target objects when a parameter is defined inside an incremental macro.
Metadata
You can edit the metadata settings in Metadata.
-
Set primary keys in the Key column.
-
Set if the column is nullable in the Nullable column.
-
Select a column and click Edit to set column name, key, nullable and data type.
Running the query
You can run the query at any time during development by clicking Run query. The first time you run a query, metadata will be retrieved. When the query is run you can go to Metadata and set a primary key, which is required fo for the transformation to be complete.
Using macros to adapt the query to load type
If incremental load is used, you can use Macros to adapt the query to run if an initial load or an incremental load is performed.
Select the macro and add the query inside the respective clause.
-
Q_RUN_INCREMENTAL adds a clause where you can place a query that is only run when an incremental load is performed.
-
Q_RUN_INITIAL_LOAD adds a clause where you can place a query that is only run when an initial load is performed.
You do not need to specify the complete query with each macro. You can adjust or add code based on your use case.
Use case: Incremental filter using date range:
In this use case, a set of orders data needs to be transformed with custom SQL. Since this is a large dataset, you want to include incremental filtering to ensure that the incremental load only considers orders from the past 12 hours.
This is the base query:
This is the filter to apply during the incremental run, where you only consider orders that were added the past 12 hours:
This is the complete query, using the filter macro and with table names replaced with the ${TABLE} notation:
This results in an initial load query:
And an incremental load query:
Use case: Incremental filter using data in the target :
In this use case, a set of orders data needs to be transformed with custom SQL. Since orders are a large dataset, you want to include incremental filtering to ensure that the incremental load only considers order details records that are newer than the data in the target table. You also want to mark if a row was processed by the initial load or an incremental load.
This is the base query:
This is the filter to apply during the incremental run. “ORDERS_DERIVED” is the name of the target dataset.
This is the complete query, using the filter macro, and with table names replaced with the ${TABLE} notation. LOADED_BY is set to INIT if the load is initial, and INCR if the load is incremental:
This results in an initial load query:
And an incremental load query:
Settings
Click to edit the settings of the SQL transformation.
Enter a name for the target dataset in Name.
You can also add a longer description in Description.
In Materialization, select if the transformed output should be materialized or not. You can choose to inherit the setting from the data task settings.
-
On will create tables and will handle associated ELT processing.
-
Off will create views that perform transformations on the fly.
In Historical Data Store (Type 2), select if you want to keep historical data. You can choose to inherit the setting from the data task settings. This setting requires that Materialization is on.
Incremental load allows you to adjust the query for an incremental data load by applying filters or other conditions to reduce the set of data being processed using macros. Incremental load is only available if the data is materialized as tables.
-
When Incremental load is On
The first execution of the task will perform an initial load, inserting all results of the query into your target table. Subsequent executions will execute incremental loads leveraging filters or specific conditions you have defined for incremental processing. During incremental load, the task will only process data as an update or an insert, deletes are not managed.
-
When Incremental load is Off
The first execution of the task will perform an initial load, inserting all results of the query into your target table. Subsequent executions will process all results of the query by comparing to your target table, and processing records that are new, changed or deleted.
Best practice
-
When you add a SQL transformation, a primary key is not defined automatically. Add a key to the dataset in the Dataset tab.
-
Do not create column headers in the transformed output manually.
-
Avoid using SELECT with *, as this may return different columns each time the query is run. If there was a change in the schema, change the query accordingly and perform Describe table again. Then you can edit columns and adjust tables.