Using filters to create "if" conditions on customer data
The video_customers.xlsx dataset contains a sample with several information about the US customers of a video streaming company. Names, states, ages and subscription dates, among others, are listed in this file.
After combining different filters that work like "if" conditions, you will be able to perform cleansing operation on the customers category that you want to target specifically. In this example you want to output a file listing the customers in the 30-55 age range and from the five US states with the most registered customers, while getting rid of unusable data.
Download the file: video_customers.xlsx.
Adding a preparation for the video customers dataset
Add a preparation to start preparing and cleansing your data.
You can create a preparation from a dataset already available in Talend Data Preparation or one of your local files. When you add a preparation with the corresponding button, it will be created in the folder in which you are currently working. Furthermore, your preparation will be automatically saved in the preparations list, and all the changes you make are also saved automatically.
Before you begin
Procedure
Results
Your dataset opens with an empty recipe, and you can start adding preparation steps. All your changes are automatically saved.
Removing empty and invalid records
The quality bar indicates that a column contains empty or invalid records.
In the quality bar, data that matches the column type is shown in green, while red shows invalid data that does not match the column type. Empty records are shown in grey.
Because you want to focus on customers from a specific age range and specific states, empty data in the corresponding columns would be useless for you. You are simply going to remove the rows with empty data in the age and state columns, as well as removing invalid values from the dataset.
Procedure
Results
The rows with empty or invalid values for the age, state and phone numbers columns have been removed. Now that the dataset is a bit cleaner, you can start focusing on the data you want to put in light.
Creating filters on the ages and states
Creating a filter is a quick way to identify or isolate data.
You will once again use filters to isolate the data that is of the most interest to you in this example, namely the age and location of your customers. The data profiling area on the bottom right of the interface allows you to interact with the charts illustrating the data of the age and state columns, and select a specific range of data.
Procedure
Results
Your sample now only displays a restricted list of customers, that match the conditions you had fixed.
Exporting the data for a specific customer range
Now that the data that you wanted to focus on has been isolated and cleansed, you can export it as a local file.
Through the use of filters, you have performed conditioned actions on your data. Only the data applying to customers ranging from 35 to 50 in certain states have been kept. The prepared dataset is now ready to be exported.
Procedure
- Click the Export button.
-
Choose the file format you want to use when exporting your data:
- If you choose Local CSV file, select which field delimiter, text enclosure and escape characters to use and enter a name for the file to export.
- If you choose Local XLSX file, choose a name for the file to export.
- If you choose Amazon S3, enter your credentials and other information to store your file on Amazon S3.
Results
The data you cleansed using your preparation is exported to a local file.