Recreating email addresses before uploading them to a marketing solution
The Marketing leads dataset represents a file you received from a marketing campaign but the email addresses are missing.
In order for this file to be uploaded into your marketing solution, you have to create those email addresses. You will guess them from the name and the company of the prospects, and from the email format usually used by those companies.
Download the files: marketing_leads.zip and emails_reference.zip.
Adding a preparation for the Marketing leads dataset
Add a preparation to start preparing and cleansing your data.
You can create a preparation from a dataset already available in Talend Data Preparation or one of your local files. When you add a preparation with the corresponding button, it will be created in the folder in which you are currently working. Furthermore, your preparation will be automatically saved in the preparations list, and all the changes you make are also saved automatically.
Before you begin
Procedure
Results
Your dataset opens with an empty recipe, and you can start adding preparation steps. All your changes are automatically saved.
Removing empty records from the dataset
The grey part of the quality bar indicates that a column contains empty records. You may want to remove the rows containing these empty records.
In the quality bar, data that matches the column type is shown in green, while red shows invalid data that does not match the column type.
Procedure
Results
All the rows containing empty records are removed from the dataset and the quality bar under each column is now fully green.
Removing blank spaces from prospects names
Blank spaces can be present before and after the content from each cell.
They are more likely to be present in columns containing data manually entered by someone, such as a name or a phone number. These spaces are shown as grey squares.
You can see that the first_name and last_namecolumns contain some entries with blank spaces.
Procedure
Results
Blank spaces are removed from the selected columns.
Duplicating columns
If you want to have a copy of a given column, you may want to duplicate that column.
You will duplicate two columns and use them as a base for recreating the email addresses.
Procedure
Results
The two columns are duplicated. The information contained in those two columns will be used to create the local part of the email addresses you want to create.
Renaming columns
In order to better identify a column, you can rename it.
You will rename the two columns previously created and give them a meaningful name. Those new columns will be used to create the local part of the email addresses.
Procedure
Results
The selected columns are renamed.
Putting the names in lower case
Sometimes, you may need to change the case of some text to lower case. This can be useful if you want to append this text to some other text.
The data contained in the two renamed columns starts with an upper case. You will put the all the text in lower case, in order to merge the two columns.
Procedure
Results
Using the email domains from another dataset
The lookup feature matches data from the current dataset with its counterpart in a reference dataset.
On the one hand, you have the marketing_leads dataset, that you are currently working on, that contains information about the company where the listed customers are working. On the other hand, the emails_reference contains a list of companies, and the email domain that they are using.
You are going to do a lookup on the emails_reference dataset, and extract the information about email domains to match them with the companies from the marketing_leads dataset.
Before you begin
To perform the lookup on the emails_reference, you need to import it by using the Add dataset button in the Datasets view of the homepage.
Procedure
Results
This information about email domains will be added to the first names and last names from the duplicated column to create the complete email addresses.
Merging the content of several columns
In some cases, the data you want to use is split in several columns. You can group these columns using a concatenation.
All the information you need to create the email addresses is now ready, and you only need to assemble it. You will merge the three columns that you have created since the beginning of this scenario.
Procedure
Results
The content of the three columns has been merged. You have created a column containing valid email addresses, based on first names, last names and a web domain for each company.
Deleting the duplicated column
If you want to remove a column you do not need, you can delete a column.
Now that you have used the columns for the concatenation, you can delete them and only keep the result.
Procedure
Results
The selected columns are deleted.
Exporting the prepared email addresses
Once your preparation is complete, you may want to export the data you have cleansed.
You have managed to prepare your dataset to recreate the information that was originally missing, the customers email addresses in this case.
Procedure
- Click the Export button.
-
Choose the file format you want to use when exporting your data:
- If you choose Local CSV file, select which field delimiter, text enclosure and escape characters to use and enter a name for the file to export.
- If you choose Local XLSX file, choose a name for the file to export.
- If you choose Amazon S3, enter your credentials and other information to store your file on Amazon S3.
Results
The data you cleansed using your preparation is exported to a local file.