Designing a Job to implement the scenario

The scenario comprises the following four steps:

Creation of the Job, configuration of the input file parameters, and reading of the input file.
Mapping of data and transformations.
Definition of the reference file parameters, relevant mapping using the tMap component, and selection of inner join mode.
Redirection of the output into a MySQL table.

Step 1: Job creation, input definition, file reading

Procedure

Launch Talend Studio, and create a local project or import the demo project if you are launching Talend Studio for the first time.
To create the Job, right-click Job Designs in the Repository tree view and select Create Job.
In the dialog box displaying then, only the first field (Name) is required. Type in California1 and click Finish.

An empty Job then opens on the main window and the Palette of technical components (by default, to the right of Talend Studio) comes up showing a dozen of component families such as: Databases, Files, Internet, Data Quality and so on, hundreds of components are already available.
A tFileInputDelimited component is used to read the file California_Clients. This component can be found in the File > Input group of the Palette. Click this component then click to the left of the design workspace to place it on the design area.
Define the reading properties for the tFileInputDelimited component (such as file path, column delimiter, encoding, and so on) using the Metadata Manager.
This tool offers numerous wizards for configuring parameters. It also stores these properties for a one-click re-use in all future Jobs.
As the input file is a delimited flat file, select File Delimited on the right-click list of the Metadata folder in the Repository tree view. Then select Create file delimited.
A wizard dedicated to delimited file thus displays.
- At step 1, only the Name field is required: simply type in California_clients and go to the next step.
- At Step 2, select the input file (California_Clients.csv) by clicking the Browse... button. Immediately an extract of the file shows on the Preview, at the bottom of the screen so that you can check its content. Click Next.
  
  In this example, the California_Clients.csv file is stored under C:/talend/Input.
- At step 3, define the file parameters: file encoding, line, and column delimiters, and so on. As the input file is standard, most default values are fine. The first line of the file is a header containing column names. To retrieve automatically these names, click Set heading row as column names, click Refresh Preview, and click Next to the last step.
- At step 4, each column of the file is to be set. The wizard includes algorithms which guess types and length of the column based on the file first data rows. The suggested data description (called schema in Talend Studio) can be modified at any time. In this scenario, they can be used as is.
The California_clients metadata is created after the above four steps.
Select the tFileInputDelimited you had dropped on the design workspace earlier and select the Component view at the bottom of the window.
Select the vertical tab Basic settings. In this tab, you can find all technical properties required to let the component work.
Select Repository as Property Type in the list. A new field shows: Repository, click the [...] button and select the relevant Metadata entry on the list: California_clients.

All parameters of the tFileInputDelimited component get automatically filled out.
Add a tLogRow component (from the Logs & Errors group). To link both components, right-click the input component and select Row > Main. Then click the output component: tLogRow.
Select the Run tab on the bottom panel.
Enable the statistics by selecting the Statistics check box in the Advanced Settings vertical tab of the Run view, then run the Job by clicking Run in the Basic Run tab.

The content of the input file is listed in the console.

Step 2: Mapping and transformations

This step transforms the data from the input file using a tMap component.

The tMap component can handle:

multiple inputs and outputs
search for reference (simple, cartesian product, first, last match...)
join (inner, outer)
transformations
rejections
and more...

Procedure

Remove the link between the two components by right-clicking the link and selecting Delete option. Then place the tMap of the Processing component group in between before linking the input component to the tMap.
Connect the tFileInputDelimited component to the tMap component using a Row > Main link.
Right-click the tMap component, select Row > *New Output* (Main) and click the tLogRow component. Type in out1 in the dialog box to implement the link.
Logically, a message box shows up (for the back-propagation of schemas), ignore it by clicking No.
Double-click the tMap to access its interface.
The left part displays the schema (description) of the input file (row1). The right part displays the output schema (out1).
Place the Firstname and Lastname columns to the right, onto the Name column as shown on the screen below. Then place the other columns Address and City to their respective line.
Carry out the following transformations on each column:
- Change the Expression of the Name column to row1.Firstname + " " + row1.LastName. This concatenates the Firstname column with the Lastname column.
- Change the Expression of the Address column to row1.Address.toUpperCase(), which changes the address case to upper case.
Remove the Lastname column from the out1 table and increase the length of the remaining columns. To do so, go to the Schema Editor located at the bottom of the Map editor and proceed as follows:
1. Select the column to be removed from the schema and click the cross icon.
2. Select the column of which you need increase the length size.
3. Type in the length size you intend in the length column. In this example, change the length of every remaining column to 40.
Information noteNote: As the first name and the last name of a client are concatenated, it is necessary to increase the length of the name columns in order to match the full name size. No transformation is made to the City column.
Click OK to validate the changes and close the Map editor interface.
Run your Job.

Results

The addresses are displayed in upper case and the first names and last names are gathered in the same column.

Step 3: Reference file definition, remapping, inner join mode selection

Procedure

Define the metadata corresponding to the LosAngelesandOrangeCounties.txt file in the same way described in Step 6 of Step 1: Job creation, input definition, file reading using the wizard.

At step 1 of the wizard, name this metadata entry as LA_Orange_cities.
Drop the newly created metadata to the top of the design area to create automatically a reading component pointing to this metadata.
Link this component to the tMap component.
Double-click again on the tMap component to open its interface. Note that the reference input table (row2) corresponding to the LA and Orange county file shows to the left of the window, right under the main input (row1).
Define the join between the main flow and the reference flow.

Place the City column from your first input table onto the City column of your reference table to establish a join between the two files. A violet link displays, to materialize this join.
Place the County column from the right section (row2) onto the Expression field of the County column on the right section (out1).
Click the OK button to validate your changes and run the new Job.

The following output should display on the console.

The last columns of some of the records are filled with Los Angeles and Orange while others are empty. The reason for this is that by default, the tMap implements a left outer join mode. If you want to filter your data to only display lines for which a match is found by the tMap, then open again the tMap, click the tMap settings button and select the Inner Join in the Join Model list on the reference table (row2).

Step 4: Output to a MySQL table

This part directs the output flow to a MySQL table.

Procedure

Create the metadata describing the connection to the MySQL database. Expand the Metadata > MySQL nodes in the Repository view and double-click DemoMySQL (on the condition that you imported the Demo project correctly) to open the metadata wizard.
On step 2 of the wizard, type in the relevant connection parameters. Check the validity of this connection by clicking on the Check button. Eventually, validate your changes, by clicking on Finish.
Place this metadata to the right of the design workspace, while maintaining the Ctrl key down to create automatically a tMysqlOutput component.
Remove the tLogRow component from your Job.
Reconnect the out1 output flow from the tMap to the new component tMysqlOutput.
On the Basic settings tab of the tMysqlOutput component:
1. Type in LA_Orange_Clients in the Table field to name your target table which will get created on the fly.
2. Select the Drop table if exists and create option or on the Action on table field.
3. Click Edit Schema and click the Reset DB type button (DB button on the tool bar) in order to fill out automatically the DB type if need be.
Run again the Job.

Results

The target table should be automatically created and filled with data.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here