Creating a synonym index for people names using tMap
This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real-Time Big Data Platform, Talend Data Services Platform, and Talend Data Fabric.
In this scenario, a four-component Job creates an index storing people names and their relative nicknames.
The source data to be used in this scenario is stored in a .csv file, an extract of which is shown below:
Country;FirstName;Nickname1;Nickname2;Nickname3;Nickname4
France;Anne;Ninon;Annie;Ninette;Ann
France;Bernadette;Nad;Netty;Dadette
France;Albert;Al
France;Alexandre;Alex
France;Alfred-Hubert;Alu
France;Andrew;Andy
France;Anthony;Anton;Tony;Tonio
France;Artus;Artie
France;Benoit;Ben
France;Catherine;Cate;Katherine;Kathryn
France;Charles;Charlie;Charlot;Chuck
France;Christophe;Christian;Chris;Kris;Kristof
France;Christian;ChrisThis data describes people's home country (not to be inserted into the index), first names (reference entries) and frequently used nicknames (synonyms).
The four components used in this Job are:
-
tFileInputDelimited: this component reads the source data and inputs them to tSynonymOutput.
-
tMap: this component is used to transform the source data into two separated columns representing the first names and the nicknames, in the meantime, ignoring the people's home country information.
-
tSynonymOutput: this component creates the index of interest in this scenario and feeds it with the synonyms given in the source file.
-
tLogRow: this component lists the data that have been inserted into the newly created index.