Preparing client data to upload it to a marketing solution
The Customer Contact Data dataset represents a file containing a list of clients with different information such as their name, their company or their country.
You will prepare and clean this data in order to be able to upload it to a marketing solution.
Download the file: customer_contact_data.csv.
Adding a preparation for the Customer Contact Data dataset
Add a preparation to start preparing and cleansing your data.
You can create a preparation from a dataset already available in Talend Data Preparation or one of your local files. When you add a preparation with the corresponding button, it will be created in the folder in which you are currently working. Furthermore, your preparation will automatically be saved in the preparations list, and all the changes you make are also saved automatically.
Before you begin
Procedure
Results
Your dataset opens with an empty recipe, and you can start adding preparation steps. All your changes are automatically saved.
Removing empty and invalid rows from the customer data
You can remove all the empty and invalid entries from a dataset in one go.
As you can see in the quality bar under each column, the customer_contact_data.csv contains several rows with either empty or invalid cells. You are going to delete all these rows. Using the quality bar is a quick way of removing empty and invalid records for a given column, but you want to perform this on the whole dataset.
Procedure
Results
All the rows containing empty records are removed from the dataset and the quality bar under each column is now fully green.
Extracting the local and domain parts from email addresses
An email address, such as user@talend.com, is made up of two parts separated by the @ symbol: the local part (user in this example) and the domain part (talend.com in this example).
The two parts of an email address can be extracted and copied to two new columns.
You will extract the two parts of the email addresses to make it easier to upload to the marketing solution.
Procedure
Results
The local part and the domain part are extracted from the email addresses. The extracted data is put into two new columns.
Removing blank spaces in customers names
Blank spaces can be present before and after the content from each cell.
They are more likely to be present in columns containing data manually entered by someone, such as a name or a phone number. These spaces are shown as grey squares.
You can see that the first_name column contains some entries with blank spaces.
Procedure
Results
Blank spaces are removed from the selected column.
Exporting the cleaned data
Once your preparation is complete, you may want to export the data you have cleansed.
You have cleaned the empty and invalid rows from your dataset, removed the unnecessary blank spaces, and extracted the information about the customers email addresses. The prepared dataset is now compatible with your marketing solution and you can export it.
Procedure
- Click the Export button.
-
Choose the file format you want to use when exporting your data:
- If you choose Local CSV file, select which field delimiter, text enclosure and escape characters to use and enter a name for the file to export.
- If you choose Local XLSX file, choose a name for the file to export.
- If you choose Amazon S3, enter your credentials and other information to store your file on Amazon S3.
Results
The data you cleansed using your preparation is exported to a local file.