Skip to main content Skip to complementary content

Identifying anomalies in data

The use case explains how to use the Profiling perspective of Talend Studio to analyze customer email addresses and phone numbers. It uses out-of-box indicators and patterns on the columns and shows the matching and non-matching address data.

Profiling Jobs are then generated on the analysis results to clean customer data and monitor its evolution.

You can then use the Data Explorer perspective to browse the non-matching data.

The sequence of profiling and cleansing customer data involves the following steps:

Procedure

  1. Create a column analysis on customer email addresses and phone numbers.
  2. Connect to the database which holds the customer data from the analysis editor.
  3. Add indicators to provide simple statistics on data such as row, blank and duplicate counts.
  4. Add standard patterns against which to match email addresses and phone numbers.
  5. Execute the analysis to show results in tables and charts.
  6. Access a view of the analyzed data to see invalid records.
  7. Generate out-of-box Jobs from analysis results to remove duplicate values from the Email and Phone columns.
  8. Generate out-of-box Jobs from analysis results to remove values which do not respect the standard email format or phone number format from the Email and Phone columns respectively.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!