Skip to main content

Filtering and aggregating table columns directly on the DBMS

The following scenario creates a Job that opens a connection to a Mysql database and:

For more technologies supported by Talend, see Talend components.

  • instantiates the schemas from a database table whose rows match the column names specified in the filter,

  • filters a column in the same database table to have only the data that matches a WHERE clause,

  • collects data grouped by specific value(s) from the filtered column and writes aggregated data in a target database table.

To filter and aggregate database table columns:

  • Drop the following components from the Palette onto the design workspace: tELTMysqlconnection, tSQLTemplateFilterColumns, tSQLTemplateFilterRows, tSQLTemplateAggregate, tSQLTemplateCommit, and tSQLTemplateRollback.

  • Connect the five first components using OnComponentOk links.

  • Connect tSQLTemplateAggregate to tSQLTemplateRollback using an OnComponentError link.

  • In the design workspace, select tMysqlConnection and click the Component tab to define the basic settings for tMysqlConnection.

  • In the Basic settings view, set the database connection details manually or select Repository from the Property Type list and select your DB connection if it has already been defined and stored in the Metadata area of the Repository tree view.

For more information about Metadata, see Talend Studio User Guide.

  • In the design workspace, select tSQLTemplateFilterColumns and click the Component tab to define its basic settings.

  • On the Database type list, select the relevant database.

  • On the Component list, select the relevant database connection component if more than one connection is used.

  • Enter the names for the database, source table, and target table in the corresponding fields and click the [...] buttons next to Edit schema to define the data structure in the source and target tables.

Information noteNote:

When you define the data structure for the source table, column names automatically appear in the Column list in the Column filters panel.

In this scenario, the source table has five columns: id, First_Name, Last_Name, Address, and id_State.

  • In the Column filters panel, set the column filter by selecting the check boxes of the columns you want to write in the source table.

In this scenario, the tSQLTemplateFilterColumns component instantiates only three columns: id, First_Name, and id_State from the source table.

Information noteNote:

In the Component view, you can click the SQL Template tab and add system SQL templates or create your own and use them within your Job to carry out the coded operation. For more information, see tSQLTemplateFilterColumns Standard properties.

  • In the design workspace, select tSQLTemplateFilterRows and click the Component tab to define its basic settings.

  • On the Database type list, select the relevant database.

  • On the Component list, select the relevant database connection component if more than one connection is used.

  • Enter the names for the database, source table, and target table in the corresponding fields and click the [...] buttons next to Edit schema to define the data structure in the source and target tables.

In this scenario, the source table has the three initially instantiated columns: id, First_Name, and id_State and the source table has the same three-column schema.

  • In the Where condition field, enter a WHERE clause to extract only those records that fulfill the specified criterion.

In this scenario, the tSQLTemplateFilterRows component filters the First_Name column in the source table to extract only the first names that contain the "a" letter.

  • In the design workspace, select tSQLTemplateAggregate and click the Component tab to define its basic settings.

  • On the Database type list, select the relevant database.

  • On the Component list, select the relevant database connection component if more than one connection is used.

  • Enter the names for the database, source table, and target table in the corresponding fields and click the [...] buttons next to Edit schema to define the data structure in the source and target tables.

The schema for the source table consists of the three columns: id, First_Name, and id_State. The schema for the target table consists of two columns: customers_status and customers_number. In this scenario, we want to group customers by their marital status and count customer number in each marital group. To do that, we define the Operations and Group by panels accordingly.

  • In the Operations panel, click the plus button to add one or more lines and then click in the Output column line to select the output column that will hold the counted data.

  • Click in the Function line and select the operation to be carried on.

  • In the Group by panel, click the plus button to add one or more lines and then click in the Output column line to select the output column that will hold the aggregated data.

  • In the design workspace, select tSQLTemplateCommit and click the Component tab to define its basic settings.

  • On the Database type list, select the relevant database.

  • On the Component list, select the relevant database connection component if more than one connection is used.

  • Do the same for tSQLTemplateRollback.

  • Save your Job and press F6 to execute it.

A two-column table aggregate_customers is created in the database. It groups customers according to their marital status and count customer number in each marital group.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!