Tracking data changes using Slowly Changing Dimensions (type 0 through type 3)
This scenario describes a Job that stores and manages both the current and historical employee data in a MySQL table using SCD (Slowly Changing Dimensions).
For more technologies supported by Talend, see Talend components.
The input data contains various employee details including name, age, role, and salary, and another id column is added to help ensuring the unicity of the input data.
At first, the following employee data is inserted to a new MySQL table using SCD:
id;name;age;role;salary
1;Mark Smith;30;tester;11000.00
2;Thomas Johnson;32;developer;12000.00
3;Teddy Brown;33;tester;13000.00
Then the table is updated using SCD with the following renewed employee data.
id;name;age;role;salary
1;Mark Smith;31;tester;11000.00
2;Thomas Johnson;32;developer;12000.00
3;Teddy Brown;33;writer;13500.00
You can see the age of Mark Smith is updated from 30 to 31, the role of Teddy Brown is changed from tester to writer, and his salary is raised from 13000.00 to 13500.00. In this scenario,
-
we don't want to track the data changes for the name field, so we will perform Type 0 SCD on it,
-
we want the new age data to overwrite the existing data, so we will perform Type 1 SCD on it,
-
we want to retain the full history of the role data, and always create a new record with the changed data and close the previous record, so we will perform Type 2 SCD on it,
-
we want to keep the current and previous dimension values for the salary field, so we will perform Type 3 SCD on it.
For more information about SCD types, see SCD management methodology.