Skip to main content Skip to complementary content

Automatically formatting data based on examples

The Magic Fill function offers a convenient solution to format data types that do not have a dedicated function or to easily perform a succession of transformations with the same function.

Information noteNote: This function is not compatible with Spark Jobs, and HDFS or S3 exports.

Via a machine learning algorithm, this function allows you to define a pattern, and automatically apply a transformation on a whole column, based on a few examples that you define beforehand.

At the moment, the Magic Fill function supports the following transformation types:

  • substring
  • addition of constants (numbers, letters, special characters)
  • case sensitivity
  • semantic transformation for countries, US postal codes and states, emails, URLs and dates

For the function to work, you need to enter at least two examples of the transformation you want to apply. You can then add up to three other examples. The more examples you input, the more accurately the pattern will be identified by the function.

If the transformation program generated by the function doesn't apply to some of the data from the source column, it will remain unchanged in the target column.

Data types such as dates or phone numbers both have dedicated function that can be used to easily change their format. However full names, social security numbers or state codes, for example, do not. The following scenarios will illustrate how to use the Magic Fill function to format your data in those cases.

Formatting names with Magic Fill

You can use the magic fill function to automatically format names, based on a pattern defined by examples.

Let's take the example of a dataset with a column containing the full names or your customers.

Dataset containing full names of customers.

You would like to format those names, and only keep the first letter of the first name, followed by a dot, and then the last name in upper case. For example, George Abitbol would become G. ABITBOL. The easiest way for you to accomplish that would be to use the Magic Fill function to set some examples of how you would like the transformation to work, and apply it to the rest of the column.

Procedure

  1. Click the header of the fullname column in order to select its content.
  2. In the functions panel, type Magic fill and click the result to display the options of the associated function.
  3. Clear the Create new column check box.
    This way, the values will be fixed directly in the existing column.
  4. In the Input 1 field, enter one of the value from the fullname column that you would like to transform, Dimitri Tudor for example.
  5. In the Output 1 field, enter the same value, but with the correct format this time: D. TUDOR.
    For the function to work, you need to enter at least two complete examples of the transformation you want to apply. You can then add up to three other examples. Examples can either be taken from your dataset, or made up. The more examples you input, the more accurately the pattern will be identified by the function.
  6. Enter more before and after examples, in the remaining fields.
    • Mina Luze as Input 2 and M. LUZE as Output 2
    • Henry Bank as Input 3 and H. BANK as Output 3
    • Ben Schneider as Input 4 and B. SCHNEIDER as Output 4
    • Jonathan Oliver as Input 5 and J. OLIVER as Output 5
  7. Click Submit.

Results

From the few examples set at the beginning, the function has been able to understand the pattern, and automatically create the corresponding transformation. The names in your dataset have now been replaced with their equivalent in the expected format.
Dataset containing full names of customers with transformation.

Formatting social security numbers using Magic Fill

You can use the Magic Fill function to automatically format social security numbers, based on a pattern defined by examples.

In this example, you are working on a dataset containing customer information such as names, email addresses, but most importantly, social security numbers. The problem is that those social security numbers have been entered in the wrong format, without the hyphens. For example, a number such as 334-20-6259, has been entered as 334206259 in this dataset. Because the format is wrong, most of the data is interpreted as Bank Routing Transit Numbers by the application, and considered invalid on top of that.

Dataset containing customer information.

To fix this mistake, you are going to use the Magic Fill function and apply the correct social security number format on your data, following a pattern based on a few examples you will input.

Procedure

  1. Click the header of the ssn column to select its content.
  2. In the functions panel, type Magic fill and click the result to display the options of the associated function.
  3. In the Input 1 field, enter one of the value from the ssn column that you would like to transform, 472989590 for example.
  4. In the Output 1 field, enter the same value, but with the correct format this time: 472-98-9590.
    For the function to work, you need to enter at least two complete examples of the transformation you want to apply. You can then add up to three other examples. Examples can either be taken from your dataset, or made up. The more examples you input, the more accurately the pattern will be identified by the function.
  5. Enter more before and after examples in the remaining fields:
    • 428990989 as Input 2 and 428-99-0989 as Output 2
    • 281066499 as Input 3 and 281-06-6499 as Output 3
    • 244356471 as Input 4 and 244-35-6471 as Output 4
    • 233262165 as Input 5 and 233-26-2165 as Output 5
  6. Click Submit.

Results

A new column is created, where the transformation defined by your examples, has been applied to the rest of the numbers. The social security numbers are now in the correct format and are automatically detected as such, as shown by the semantic type in the header of this new column. You can now delete the other column with the invalid data.
Dataset containing customer information with the new column.

Leveraging semantic awareness to transform state codes into states using Magic Fill

Based on the example that you will input, the Magic Fill function also has the ability to recognize and format data that corresponds to certain semantic types: countries, emails, URLs, dates, as well as US postal codes and states.

Let's take the example of a dataset containing some customer information, like their names, email addresses, or the state they live in. This scenario will illustrate how you can use the Magic Fill function to convert the two-letter state codes to their full state name equivalent, to have them in a format that is easier to read.

Dataset containing customer information.

Procedure

  1. Click the header of the state column in order to select its content.
  2. In the functions panel, type Magic fill and click the result to display the options of the associated function.
  3. In the Input 1 field, enter one of the values from the state column that you would like to transform, NY for example.
  4. In the Output 1 field, enter the full name of the corresponding state: New York.
    For the function to work, you need to enter at least two complete examples of the transformation you want to apply. You can then add up to three other examples. Examples can be taken from your dataset, or you can use other valid US state code values from outside your dataset. The more examples you input, the more accurately the transformation will be identified by the function.
  5. Enter more before and after examples in the remaining fields:
    • AZ as Input 2 and Arizona as Output 2
    • TX as Input 3 and Texas as Output 3
    • IN as Input 4 and Indiana as Output 4
    • MI as Input 5 and Michigan as Output 5

    Based on these examples, the function will understand that the all the inputs you entered are two-letter state codes, and the outputs are full state names, and will apply the corresponding transformation to the rest of the column.

  6. Click Submit.

Results

A new column is created, where the formatting operation defined by your examples has been applied to the rest of the state codes. You can see from the column header that the new values all correspond to the US State semantic type, ensuring that they are in the good format. You can now delete the state column.
Dataset containing customer information with a new column.

Leveraging semantic awareness to convert dates to day of the week using Magic Fill

Based on the example that you will input, the Magic Fill function also has the ability to recognize and format data that corresponds to certain semantic types: countries, emails, URLs, US postal codes and states, as well as dates.

Let's take the example of a small retail company with several stores across the city. The following dataset shows how the total number of sales is recorded with the corresponding date everyday. The format used to enter dates is dd/MM/yyyy, but you would like to also display the corresponding day of the week, in order to make some statistical analysis and try to determine which days are the most profitable. This scenario will show of you can use the Magic Fill function to extract this information.

Dataset containing number of sales recorded.

Procedure

  1. Click the header of the date column in order to select its content.
  2. In the functions panel, type Magic fill and click the result to display the options of the associated function.
  3. In the Input 1 field, enter one of the values from the date column that you would like to transform, 17/01/2018 for example.
  4. In the Output 1 field, enter the day of the week that you know is the corresponding one: Wednesday.
    For the function to work, you need to enter at least two complete examples of the transformation you want to apply. You can then add up to three other examples. Examples can be taken from your dataset, or you can use other valid date/day couples. The more examples you input, the more accurately the transformation will be identified by the function.
  5. Enter more before and after examples in the remaining fields:
    • 02/11/2018 as Input 2 and Friday as Output 2
    • 06/12/2018 as Input 3 and Thursday as Output 3
    • 22/09/2019 as Input 4 and Sunday as Output 4
    • 13/07/2019 as Input 5 and Saturday as Output 5

    Based on these examples, the function will understand that the all the inputs you entered are dd/MM/yyyy dates, and the outputs are the matching days of the week, and will apply the corresponding transformation to the rest of the column.

  6. Click Submit.

Results

A new column is created, displaying the day of the week for each date of your initial column. You can already see that most of the sales seem to be generated on Wednesdays and Saturdays, which is an information that was not available before. From there, you can easily filter on any of the weekdays, or try some aggregation via the charts panel to get an average of the sales for each weekday.
Dataset containing number of sales recorded with a new column.

Converting organization names to their abbreviated forms using Magic Fill

The Magic Fill function can be used to transform names, units or expressions into their abbreviated forms.

In this example, the dataset to improve contains data on people working for well-known national or international organizations. However, these organizations full names are often long and less well-known as the corresponding acronym. So in order to make the dataset easier to read, you will use the Magic Fill function to convert the full names into their acronyms.
Dataset containing data on people working for well-known national or international organizations.

Procedure

  1. Click the header of the organization column in order to select it.
  2. In the functions panel, type Magic fill and click the result to display the options of the associated function.
  3. In the Input 1 field, enter one of the values from the organization column that you would like to transform, World Wildlife Fund for example.
  4. In the Output 1 field, enter the corresponding acronym: WWF.
    For the function to work, you need to enter at least two complete examples of the transformation you want to apply. You can then add up to three other examples. Examples can either be taken from your dataset, or made up. The more examples you input, the more accurately the pattern will be identified by the function.
  5. Enter more before and after examples in the remaining fields:
    • Federal Bureau of Investigation as Input 2 and FBI as Output 2
    • International Court of Justice as Input 3 and ICJ as Output 3
    • World Trade Organization as Input 4 and WTO as Output 4
    • European Union as Input 5 and EU as Output 5

    Based on these examples, the function will understand that it only has to keep the first upper case letter of each word to transform full names into the corresponding acronym.

  6. Click Submit.

Results

A new column is created, where the transformation defined by your examples has been applied to the rest of the organizations names. You can know recognize more easily which organizations appear in the dataset.
Dataset containing data on people working for well-known national or international organizations with their acronyms.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!