Merging two datasets in HDFS (deprecated)
This scenario applies only to Talend products with Big Data.
This scenario illustrates how to use tSqoopMerge to merge two datasets that are sequentially imported to HDFS from the same MySQL table, with modifications of a record in between.
id,wage,mod_date
0,2000,2008-06-26 04:25:59
1,2300,2011-06-12 05:29:45
2,2500,2007-01-15 11:59:13
3,3000,2010-05-02 15:34:05
The path to it in HDFS is /user/ychen/target_old.
id,wage,mod_date
0,2000,2008-06-26 04:25:59
1,2300,2011-06-12 05:29:45
2,2500,2007-01-15 11:59:13
3,4000,2013-10-14 18:00:00
The path to it in HDFS is /user/ychen/target_new.
These datasets were both imported by tSqoopImport. For a scenario about how to use tSqoopImport, see Importing a MySQL table to HDFS.
The Job in this scenario merges these two datasets with the newer record overwriting the older one.
Before starting to replicate this scenario, ensure that you have appropriate rights and permissions to access the Hadoop distribution to be used. Then proceed as follows: