Skip to main content Skip to complementary content

Updating and reading data from an Iceberg branch

The second step of this scenario is to update the data from the newly created Iceberg branch. Then you will be able to see the update by comparing the results of the same query on both the main Iceberg table and the Iceberg branch.

About this task

For this task, the Update branch, Read from main branch, and Read from cleaned branch subJobs are used.

Procedure

  1. From the Basic settings view of tIcebergRow in Update branch, configure the parameters as follows:
    tIcebergRow Basic settings view.
    1. From the Connection drop-down list, select the connection component to be used. In this example it is tIcebergConnection_1.
    2. In the Sql query field, enter the SQL query to perform. In this example it is "UPDATE default.marketing_toclean.branch_cleaned_data SET membership_type=null where (last_purchase_date > 40 or total_spend < 450) and membership_type='Bronze'" which enables you to remove the bronze membership to customers that spent less than 450$ or that did not purchase for 40 days.
    Information noteTip: When you want to make a query on an Iceberg branch, the syntax to follow is default.nameoftheoriginaltable.branch_branchname.
  2. Execute the Update branch subJob by clicking the Run button from the Run tab.
    The data is updated in the marketing_toclean branch.
  3. From the Basic settings view of tIcebergInput in Read from main branch subJob, configure the parameters as follows:
    tIcebergInput Basic settings view.
    1. From the Property Type drop-down list, select where you store the data. In this example, it is Built-In.
    2. In the Sql query field, enter the SQL query to perform. In this example it is "SELECT * FROM marketing_toclean WHERE membership_type='Bronze'" which enables you to select only the customers that have the Bronze membership in the marketing_toclean Iceberg table.
    3. From the Connection drop-down list, select the connection component to be used. In this example it is tIcebergConnection_1.
    4. Leave the other parameters as is.
  4. From the Basic settings view of tLogRow in Read from main branch subJob, configure the parameters as follows:
    tLogRow Basic settings view.
    1. Select the Basic option from the Mode section.
    2. In the Field Separator field, enter the separator which will delimit data on the Log display. In this example, it is "|".
    3. Leave the other parameters as is.
  5. Repeat steps 3 and 4 for the Read from cleaned branch subJob to read data from the branch_cleaned Iceberg branch. You only need to change the SQL query in tIcebergInput with the following: "SELECT * FROM default.marketing_toclean.branch_cleaned_data WHERE membership_type='Bronze'"
  6. Execute the Read from main branch and Read from cleaned branch subJobs by clicking the Run button from the Run tab.
    The results appear in the Execution console of your Job.

Results

You can now compare the result for the marketing_toclean Iceberg table which is 116, and the result for the cleaned_data Iceberg branch which is 66. These different results mean that the data are properly updated in the cleaned_data branch.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!