Setting a data model in the Merging campaign
- Cloud API Services Platform
- Cloud Big Data
- Cloud Big Data Platform
- Cloud Data Fabric
- Cloud Data Integration
- Cloud Data Management Platform
- Cloud Pipeline Designer Standard Edition
- Data Fabric
- Qlik Talend Cloud Enterprise Edition
Data models decide the structure of the data to be managed. They are used for the syntactic and semantic validation of data.
You can define the access permission per role to each of the attributes listed in a data model.
Procedure
-
On the Add campaign page, click Data
model and select from the model list the data structure you want to
use in the CRM data deduplication campaign.
The Data Model list gives access to all the data models that have been defined.
-
Select the buttons next to
each of the attributes in the data structure to set permission per attribute and
per data steward and define who can view/edit which attributes.
Option Description Provides a read/write access to the attribute in the data model. Provides only a read access to the attribute in the data model. This type of access is useful if the data steward needs to access the information to make a relevant decision but must not change the value, for instance unique identifiers of other elements linked to the entity the steward is viewing, or data that you know is reliable and must not be changed.
Provides no access to the attribute. Hiding an attribute is useful if the information is sensitive and should not be visible by the data steward, financial information for instance. Another example of attributes to be hidden is if the information is just noise for the steward, technical identifier for instance, but need to be propagated as part of the task.
Example
In the CRM Data Deduplication campaign, you grant a read-only access to the identifier attribute for the data stewards who are assigned the Account analyst role.
-
Select a rule from the Survivorship
Rule lists next to each of the attributes.
These rules are used to decide what attribute values define the master records when loading data into the campaign. Data stewards can then manually modify these choices.
- First valid: Selects the first source which contains a valid value with regards to the constraints defined in the associated data model. A value is valid if it complies with all of the defined constraints and rules. "First" is defined by the order of the records when the task is created.
- First not null: Selects the first source which contains a non-empty value, where "first" is defined by the order of the records when the task is created.
- Most common: Selects the most common attribute value of the duplicates coming from one or more data sources.
- Most recent: Selects the most recent attribute value of the duplicates coming from one or more data sources. This is based on the metadata of the last update date.
- Most trusted: Selects the most trusted attribute value of the duplicates as per the trust score you set when creating the campaign or when loading the tasks in the campaign. If no trust score is defined, this option does not work.
You can select one rule for all the attributes by selecting it from the list in the top right corner of the form. If a given algorithm cannot be applied, the rule falls back to First not null. For example, if you do not set a trust score and you select Most trusted during the campaign definition, First not null is used in place. Similarly, First not null is used if you select Most common or First valid and there are no common or no valid values among the data duplicates.Example
Below are examples about how survivorship rules dictate what value to choose to build master records.First valid: Email address:- If the first value is not valid while the second is, the second email wins.
- If all email addresses are invalid, the first non-empty value wins.
First not null: First name:- If the first value is empty while the second is not, the second first name wins.
- If all first names are empty, first name is empty in master record.
Most common: Last name:- If last names are identical in two source records, this value wins.
- If last names are different in all source records, the first non-empty value wins.
Most recent Phone number and timestamp:- If one phone number has the most recent timestamp, this value wins.
- If all phone numbers have the same timestamp, the first non-empty value wins.
Most trusted: Address:- If all addresses in the source records have trust scores, the value with the highest score wins.
- If all addresses in the source records have trust scores and two are identical, the first identical address wins.
- If all addresses do not have trust scores, the first non-empty value wins.
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!