Skip to main content Skip to complementary content

Step 2: Loading changes from the source database table into the Hive external table

This step reads only the changes from the source database table and loads them into the Hive external table employee_extnl.

Procedure

  1. The Big Data Batch Job is as follow:
    • The source table is filtered by the last updated timestamp which is maintained in the cdc_control table. This is done by using this SQL in the Where condition of the tmysqlInput component.

      where cdc.Table_Name='employee_table' and emp.`Record_DateTime`> cdc.Last_executed"

    • The tAggregateRow loads one row per run into the cdc_control table. It does an update else insert operation on the table. If a record for the table already exists, it will update the record with the run time of the Job.

      The runtime can be set by using the TalendDate.getCurrentDate() function.

    The following shows the data in the source employee_table table after new records are added:
  2. Run the Job.
    The following shows the data in the employee_extnl external Hive table after the Job is run:

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!