Skip to main content Skip to complementary content

Preparing client data to upload it to a marketing solution

The Customer Contact Data dataset represents a file containing a list of clients with different information such as their name, their company or their country.

You will prepare and clean this data in order to be able to upload it to a marketing solution.

Download the file: customer_contact_data.csv.

Adding a preparation for the Customer Contact Data dataset

Add a preparation to start preparing and cleansing your data.

You can create a preparation from a dataset already available in Talend Data Preparation or one of your local files. When you add a preparation with the corresponding button, it will be created in the folder in which you are currently working. Furthermore, your preparation will automatically be saved in the preparations list, and all the changes you make are also saved automatically.

Before you begin

You have created the customer_contact_data dataset in Talend Cloud Data Preparation using the previously downloaded file.

Procedure

  1. From the homepage, click Preparations to open the list of preparations.
  2. Click the Add Preparation button.
  3. From the list of Datasets, select customer_contact_data to use it as source material.
    Add a preparation window with a list of datasets available to create a preparation, including customer_contact_data.
    The Preparation name field is automatically filled in but you can change the preparation name.
  4. Click Submit.

Results

Your dataset opens with an empty recipe, and you can start adding preparation steps. All your changes are automatically saved.

Removing empty and invalid rows from the customer data

You can remove all the empty and invalid entries from a dataset in one go.

As you can see in the quality bar under each column, the customer_contact_data.csv contains several rows with either empty or invalid cells. You are going to delete all these rows. Using the quality bar is a quick way of removing empty and invalid records for a given column, but you want to perform this on the whole dataset.

Some columns in the grid with their quality bar showing invalid or empty cells.

Procedure

  1. Click the menu icon on the top left of the grid.
  2. Select Display rows with invalid or empty values.
    Filtering options for the empty and invalid rows in the whole dataset.

    You have applied a filter on your data, and only the empty and invalid values present in the dataset are displayed.

  3. In the functions panel, type Delete these filtered rows and click the result to apply the associated function.

    Make sure that the Filtered rows radio button is selected in front of the Apply changes to field.

    The rows containing empty or invalid entries are removed from the dataset.

  4. Click the bin icon in the filter bar to clear the filter and display the whole dataset again.

Results

All the rows containing empty records are removed from the dataset and the quality bar under each column is now fully green.

Extracting the local and domain parts from email addresses

An email address, such as user@talend.com, is made up of two parts separated by the @ symbol: the local part (user in this example) and the domain part (talend.com in this example).

The two parts of an email address can be extracted and copied to two new columns.

You will extract the two parts of the email addresses to make it easier to upload to the marketing solution.

Procedure

  1. Select the email column.
  2. In the functions panel, type Extract email parts.
    Preview of the Extract email parts function where the email column is split into two parts.
  3. Click the Submit button to execute the Extract email parts function.

Results

The local part and the domain part are extracted from the email addresses. The extracted data is put into two new columns.

Removing blank spaces in customers names

Blank spaces can be present before and after the content from each cell.

They are more likely to be present in columns containing data manually entered by someone, such as a name or a phone number. These spaces are shown as grey squares.

You can see that the first_name column contains some entries with blank spaces.

Data containing blank spaces.

Procedure

  1. Select the first_name column.
  2. In the functions panel, type Remove trailing and leading characters and click the result to open the options for the associated function.
  3. In the Padding character drop-down list, select Whitespace and click Submit.

    You have the possibility to output the result of this function in a new column by selecting the Create new column check box.

  4. Repeat this action for every column containing blank spaces.

Results

Blank spaces are removed from the selected column.

Exporting the cleaned data

Once your preparation is complete, you may want to export the data you have cleansed.

You have cleaned the empty and invalid rows from your dataset, removed the unnecessary blank spaces, and extracted the information about the customers email addresses. The prepared dataset is now compatible with your marketing solution and you can export it.

Procedure

  1. Click the Export button.
  2. Choose the file format you want to use when exporting your data:
    • If you choose Local CSV file, select which field delimiter, text enclosure and escape characters to use and enter a name for the file to export.
    • If you choose Local XLSX file, choose a name for the file to export.
    • If you choose Amazon S3, enter your credentials and other information to store your file on Amazon S3.

Results

The data you cleansed using your preparation is exported to a local file.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!