Standardizing data
Standardizing data before trying to perform matching tasks is an essential step to
improve matching accuracy.
Talend provides different ways to standardize
data:
- You can standardize data against indices. Synonyms are standardized or converted
to the "master" words.
For more information on available data synonym dictionaries, see the Data synonym dictionaries.
- You can use address validation components to standardize address data against
Experian QAS, Loqate and MelissaData validation tools. The addresses returned by
these tools are consistent and variations in address representations are
eliminated. As addresses are standardized, matching gets easier.
For more information on the tQASBatchAddressRow, tLoqateAddressRow and tMelissaDataAddress components, see Address standardization.
- You can use the tStandardizePhoneNumber component to
standardize a phone number, based on the formatting convention of the country of
origin.
For more information on phone number standardization, see Phone number standardization.
- You can use other more generic components to transform your data and get more standardized records, such as tReplace, tReplaceList, tVerifyEmail, tExtractRegexFields or tMap.