Cleansing data coming from a human resource management system
The HRMS Export dataset corresponds to an Excel file that has been exported from an American human resource management system (or HRMS).
It contains the full list of employees since the creation of the company with their name, job title, hiring date, departure date if any and bank information for their salary. In this dataset, the dates are in the American date format and you want to transform them to the French date format so that they can be used with French software solutions. Also, you want to extract the account number from the IBAN number for the French accounts.
Download the file: HRMS_export.xlsx.
Adding a preparation for the HRMS export dataset
Add a preparation to start preparing and cleansing your data.
You can create a preparation from a dataset already available in Talend Cloud Data Preparation or one of your local files. When you add a preparation with the corresponding button, it will be created in the folder in which you are currently working. Furthermore, your preparation will be automatically saved in the preparations list, and all the changes you make are also saved automatically.
Before you begin
Procedure
Results
Your dataset opens with an empty recipe, and you can start adding preparation steps. All your changes are automatically saved.
Converting dates to the French format
As the date formats used across the world are not the same, you may need to change the format used in a column containing dates.
You will change the date format that is used in this dataset, from the American format, to the French format.
Procedure
Results
The date format is changed in the selected column.
Extracting the bank account number
If you want to take part of the text contained in a cell and reuse it elsewhere, you can extract part of the text.
The HRMS Export preparation contains French International Bank Account Numbers (IBAN). An IBAN is a 33-characters code, including spaces. It is made of a Country code, two check digits, a five-digit bank identifier, a five-digit branch identifier, an eleven-digit account number, and two final check digits.
You will extract the account number part of those IBAN, to a new column.
Before you begin
It is recommended to remove unnecessary blank spaces from the text records and to make sure the cells have the same length before proceeding.
Procedure
Results
The text corresponding to the selection you made is extracted to a new column, that you can rename if you want.
Exporting the prepared HRMS data
Once your preparation is complete, you may want to export the data you have cleansed.
The preparation on the hrms_export.xlsx aimed at changing the date format and extracting the account numbers from the IBAN, is now complete and you can export it.
Procedure
- Click the Export button.
-
Choose the file format you want to use when exporting your data:
- If you choose Local CSV file, select which field delimiter, text enclosure and escape characters to use and enter a name for the file to export.
- If you choose Local XLSX file, choose a name for the file to export.
- If you choose Amazon S3, enter your credentials and other information to store your file on Amazon S3.
Results
The data you cleansed using your preparation is exported to a local file.