Adding SQL transformations

You can include SQL-based tables in transformation tasks. A SQL transformation allows you to input a SQL SELECT query into a pipeline to define complex or simple transformations. You can let SQL assistant generate a query from a text prompt using generative AI. You can also use macros to adapt the query to run if an initial load or an incremental load is performed.

For more information about creating a transformation task, see Transforming data.

Adding a SQL transformation

To add a SQL transformation in a Transform data task:

In Transform, select the datasets to include in the query, and click Add SQL transformation.

Add SQL transformation is displayed where you can provide settings for the transformation.
Enter a name for the target dataset in Name.

You can also add a longer description in Description.
In Materialization, select if the transformed output should be materialized or not. You can choose to inherit the setting from the data task settings.
- On will create tables and will handle associated ELT processing.
- Off will create views that perform transformations on the fly.
In Historical Data Store (Type 2), select if you want to keep historical data. You can choose to inherit the setting from the data task settings. This setting requires that Materialization is on.
Incremental load allows you to adjust the query for an incremental data load by applying filters or other conditions to reduce the set of data being processed using macros. Incremental load is only available if the data is materialized as tables.
- When Incremental load is On
  
  The first execution of the task will perform an initial load, inserting all results of the query into your target table. Subsequent executions will execute incremental loads leveraging filters or specific conditions you have defined for incremental processing. During incremental load, the task will only process data as an update or an insert, deletes are not managed.
- When Incremental load is Off
  
  The first execution of the task will perform an initial load, inserting all results of the query into your target table. Subsequent executions will process all results of the query by comparing to your target table, and processing records that are new, changed or deleted.
Information noteIf the query will select all records that should exist in the target, set Incremental load to off. Records that are not selected will be deleted in the target.
Click Add when you are ready to create the SQL transformation.

Edit SQL transformation is displayed.
Type the query to return the SQL-based table in SELECT query.

The query can only read from the source datasets that you selected, but you can add more source datasets with Add.

You can select from the Datasets and Columns.

You can also use SQL assistant to generate a query using generative AI. For more information, see Generating a SQL transformation from a text prompt.
If incremental load is used, you can use Macros to adapt the query to run if an initial load or an incremental load is performed.

For more information, see Using macros to adapt the query to load type.
Click Extract parameters to parse the parameters in the SELECT query.

The parameters should be mapped to the source or target objects. You can map to
- History type 2 structures (_history)
- Current structures
- Target objects
  
  You can only map target objects when a parameter is defined inside an incremental macro.
Tip noteYou can replace a parameter with a reference to have a query that is easier to read. Change the value in Parameter, and the reference in SELECT query will change as well.
Click Describe table to get a preview of the query output metadata.

You can also preview the results using a sample of the data.

Previewing data
Make sure that the table has a primary key.

You can also edit the metadata settings.
- Set primary keys in the Key column.
- Set if the column is nullable in the Nullable column.
- Select a column and click Edit to set column name, key, nullable and data type.
When you are satisfied with the results of the query, click OK. The dataset is now added to the list of targets, and you can preview it.

Previewing data

You can preview the results using a sample of the data.

Data preview must be enabled on tenant level in Administration. You can find the Viewing data in Data Integration option in Settings, under Feature control.

You must be assigned the following roles to be able to preview data:

Can view data in the space where the connection resides.
Can view in the space where the project resides.

To view a preview of results:

Click View data.

Results using a sample of the data are displayed. You can set how many data rows to include in the sample with Number of rows.

You can filter the sample data in two ways:

Use to filter which sample data to retrieve.

For example, if you use the filter ${OrderYear}>2023 and Number of rows is set to 10, you will get a sample of 10 orders from 2024.
Filter the sample data by a specific column.

This will only affect the existing sample data. If you used to only include orders from 2024, and set the column filter to show orders from 2022, the result is an empty sample.

You can sort the data sample by a specific column. Sorting will only affect the existing sample data. If you used filter to only include orders from 2024 and invert the sort order, the sample data will still only contain orders from 2024.

Using macros to adapt the query to load type

If incremental load is used, you can use Macros to adapt the query to run if an initial load or an incremental load is performed.

Select the macro and add the query inside the respective clause.

Q_RUN_INCREMENTAL adds a clause where you can place a query that is only run when an incremental load is performed.
Q_RUN_INITIAL_LOAD adds a clause where you can place a query that is only run when an initial load is performed.

You do not need to specify the complete query with each macro. You can adjust or add code based on your use case.

Use case: Incremental filter using date range:

In this use case, a set of orders data needs to be transformed with custom SQL. Since this is a large dataset, you want to include incremental filtering to ensure that the incremental load only considers orders from the past 12 hours.

This is the base query:

SELECT O.ORDER_ID, O.CUSTOMER_ID, OD.PRODUCT_ID, OD.QUANTITY, O.AMOUNT , OD.LAST_UPDATE_DTM FROM “mydb”.”myStorageTask”.”ORDER” O JOIN “mydb”.”myStorageTask”.”ORDER_DETAIL” OD ON O.ORDER_ID = OD.ORDER_ID

This is the filter to apply during the incremental run, where you only consider orders that were added the past 12 hours:

WHERE OD.UPDATE_DTM > DATEADD(hour, -12, CURRENT_TIMESTAMP())

When implementing a SQL-based table, it is recommended to replace the table names with the ${TABLE} notation. This prevents hard-coding database and schema names which may change from environment to environment, for example when moving from development to production.

This is the complete query, using the filter macro and with table names replaced with the ${TABLE} notation:

SELECT O.ORDER_ID, O.CUSTOMER_ID, OD.PRODUCT_ID, OD.QUANTITY, O.AMOUNT , OD.LAST_UPDATE_DTM FROM ${ORDER} O JOIN ${ORDER_DETAIL} OD ON O.ORDER_ID = OD.ORDER_ID {{#QLIK_FUNC_IF Q_RUN_INCREMENTAL}} WHERE OD.UPDATE_DTM > DATEADD(hour, -12, CURRENT_TIMESTAMP() ) {{/QLIK_FUNC_IF}}

This results in an initial load query:

SELECT O.ORDER_ID, O.CUSTOMER_ID, OD.PRODUCT_ID, OD.QUANTITY, O.AMOUNT , OD.LAST_UPDATE_DTM FROM ${ORDER} O JOIN ${ORDER_DETAIL} OD ON O.ORDER_ID = OD.ORDER_ID

And an incremental load query:

SELECT O.ORDER_ID, O.CUSTOMER_ID, OD.PRODUCT_ID, OD.QUANTITY, O.AMOUNT , OD.LAST_UPDATE_DTM FROM ${ORDER} O JOIN ${ORDER_DETAIL} OD ON O.ORDER_ID = OD.ORDER_ID WHERE OD.UPDATE_DTM > DATEADD(hour, -12, CURRENT_TIMESTAMP() )

Use case: Incremental filter using data in the target :

In this use case, a set of orders data needs to be transformed with custom SQL. Since orders are a large dataset, you want to include incremental filtering to ensure that the incremental load only considers order details records that are newer than the data in the target table. You also want to mark if a row was processed by the initial load or an incremental load.

As you can see by this example, the macros can be used to adjust any part of the query for initial load or incremental load.

This is the base query:

This is the filter to apply during the incremental run. “ORDERS_DERIVED” is the name of the target dataset.

WHERE OD.UPDATE_DTM > = (SELECT MAX(LAST_UPDATE_DTM) FROM “mydb”.”myTrasformTask”.”ORDERS_DERIVED” )

Incremental loading can also map a ${TABLE} notation to the target objects created by the transform task.

This is the complete query, using the filter macro, and with table names replaced with the ${TABLE} notation. LOADED_BY is set to INIT if the load is initial, and INCR if the load is incremental:

SELECT O.ORDER_ID, O.CUSTOMER_ID, OD.PRODUCT_ID, OD.QUANTITY, O.AMOUNT, OD.LAST_UPDATE_DTM, {{#QLIK_FUNC_IF Q_RUN_INITIAL_LOAD}} ‘INIT’ as LOADED_BY {{/QLIK_FUNC_IF}} {{#QLIK_FUNC_IF Q_RUN_INCREMENTAL}} ‘INCR’ as LOADED_BY {{/QLIK_FUNC_IF}} FROM ${ORDER} O JOIN ${ORDER_DETAIL} OD ON O.ORDER_ID = OD.ORDER_ID {{#QLIK_FUNC_IF Q_RUN_INCREMENTAL}} WHERE OD.UPDATE_DTM >= (SELECT MAX(LAST_UPDATE_DTM) FROM ${TARGET_TABLE} ) {{/QLIK_FUNC_IF}}

This results in an initial load query:

SELECT O.ORDER_ID, O.CUSTOMER_ID, OD.PRODUCT_ID, OD.QUANTITY, O.AMOUNT, OD.LAST_UPDATE_DTM, ‘INIT’ as LOADED_BY FROM ${ORDER} O JOIN ${ORDER_DETAIL} OD ON O.ORDER_ID = OD.ORDER_ID

And an incremental load query:

SELECT O.ORDER_ID, O.CUSTOMER_ID, OD.PRODUCT_ID, OD.QUANTITY, O.AMOUNT, OD.LAST_UPDATE_DTM, ‘INCR’ as LOADED_BY FROM ${ORDER} O JOIN ${ORDER_DETAIL} OD ON O.ORDER_ID = OD.ORDER_ID WHERE OD.UPDATE_DTM >= (SELECT MAX(LAST_UPDATE_DTM) FROM ${TARGET_TABLE} )

Best practice

When you add a SQL transformation, a primary key is not defined automatically. Add a key to the dataset in the Dataset tab.
Do not create column headers in the transformed output manually.
Avoid using SELECT with *, as this may return different columns each time the query is run. If there was a change in the schema, change the query accordingly and perform Describe table again. Then you can edit columns and adjust tables.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here