Automatically formatting data based on examples
The Magic Fill function offers a convenient solution to format data types that do not have a dedicated function or to easily perform a succession of transformations with the same function.
Via a machine learning algorithm, this function allows you to define a pattern, and automatically apply a transformation on a whole column, based on a few examples that you define beforehand.
At the moment, the Magic Fill function supports the following transformation types:
- substring
- addition of constants (numbers, letters, special characters)
- case sensitivity
- semantic transformation for countries, US postal codes and states, emails, URLs and dates
For the function to work, you need to enter at least two examples of the transformation you want to apply. You can then add up to three other examples. The more examples you input, the more accurately the pattern will be identified by the function.
If the transformation program generated by the function doesn't apply to some of the data from the source column, it will remain unchanged in the target column.
Data types such as dates or phone numbers both have dedicated function that can be used to easily change their format. However full names, social security numbers or state codes, for example, do not. The following scenarios will illustrate how to use the Magic Fill function to format your data in those cases.
Formatting names with Magic Fill
You can use the magic fill function to automatically format names, based on a pattern defined by examples.
Let's take the example of a dataset with a column containing the full names or your customers.
You would like to format those names, and only keep the first letter of the first name, followed by a dot, and then the last name in upper case. For example, George Abitbol would become G. ABITBOL. The easiest way for you to accomplish that would be to use the Magic Fill function to set some examples of how you would like the transformation to work, and apply it to the rest of the column.
Procedure
Results
Formatting social security numbers using Magic Fill
You can use the Magic Fill function to automatically format social security numbers, based on a pattern defined by examples.
In this example, you are working on a dataset containing customer information such as names, email addresses, but most importantly, social security numbers. The problem is that those social security numbers have been entered in the wrong format, without the hyphens. For example, a number such as 334-20-6259, has been entered as 334206259 in this dataset. Because the format is wrong, most of the data is interpreted as Bank Routing Transit Numbers by the application, and considered invalid on top of that.
To fix this mistake, you are going to use the Magic Fill function and apply the correct social security number format on your data, following a pattern based on a few examples you will input.
Procedure
Results
Leveraging semantic awareness to transform state codes into states using Magic Fill
Based on the example that you will input, the Magic Fill function also has the ability to recognize and format data that corresponds to certain semantic types: countries, emails, URLs, dates, as well as US postal codes and states.
Let's take the example of a dataset containing some customer information, like their names, email addresses, or the state they live in. This scenario will illustrate how you can use the Magic Fill function to convert the two-letter state codes to their full state name equivalent, to have them in a format that is easier to read.
Procedure
Results
Leveraging semantic awareness to convert dates to day of the week using Magic Fill
Based on the example that you will input, the Magic Fill function also has the ability to recognize and format data that corresponds to certain semantic types: countries, emails, URLs, US postal codes and states, as well as dates.
Let's take the example of a small retail company with several stores across the city. The following dataset shows how the total number of sales is recorded with the corresponding date everyday. The format used to enter dates is dd/MM/yyyy, but you would like to also display the corresponding day of the week, in order to make some statistical analysis and try to determine which days are the most profitable. This scenario will show of you can use the Magic Fill function to extract this information.
Procedure
Results
Converting organization names to their abbreviated forms using Magic Fill
The Magic Fill function can be used to transform names, units or expressions into their abbreviated forms.
In this example, the dataset to improve contains data on people working for well-known national or international organizations. However, these organizations full names are often long and less well-known as the corresponding acronym. So in order to make the dataset easier to read, you will use the Magic Fill function to convert the full names into their acronyms.