Creating a Job script to filter data records
The Job will contain the following components:
-
a tFileInputDelimited component to read the source CSV file that contains people information. The source file contains five columns, as shown below:
name;gender;age;city;marriageStatus Van Buren;M;73;Chicago;married Adams;M;40;Albany;single Jefferson;F;66;New York;married Adams;M;9;Albany;- Jefferson;M;30;Chicago;single Carter;F;26;Chicago;married Harrison;M;40;New York;married Roosevelt;F;15;Chicago; Monroe;M;8;Boston;- Arthur;M;20;Albany;married Pierce;M;18;New York;- Quincy;F;83;Albany;married McKinley;M;70;Boston;married Coolidge;M;4;Chicago;- Monroe;M;60;Chicago;single ----- end of file --------
-
a tReplicate component, to duplicate the input data into two output flows, one of which is displayed on the console as unprocessed data, and the other goes to a column filter for processing.
-
a tFilterColumns component, to remove an unwanted column, marriageStatus.
-
a tFilterRow component, to filter the data output two tables:
-
one lists all male persons with a last name shorter than nine characters and aged between 10 and 80 years.
-
the other lists all rejected records, with an error message for each rejected record to explain why the record has been rejected.
-
-
three tLogRow components: the first one to display the unprocessed data, the second one to display the accepted records, and the third one to display the rejected records and the corresponding error messages.
-
a tJava component, to display the summary information.
The procedures below demonstrate how to write this Job script in the Job script editor, starting from adding the required components. For how to create an empty Job script, see How to create a Job script.