Skip to main content Skip to complementary content

Using regular expressions to match content

Regular expressions can be used to search for a specific pattern among your data and isolate values that you are interested in.

This scenario takes the example of someone working on a dataset that lists information about books, including their ISBN numbers. Using Talend Data Preparation, it is possible to check if the ISBN are valid, and follow the right pattern. With the Match pattern function, you can compare your data with an expression of your choice.

Procedure

  1. Click the ISBN column to select its content.
    ISBN column illustrated.
  2. In the functions list, find and select Match pattern....

    A menu opens where you can enter the pattern for your search.

  3. In the Pattern field, select other from the drop-down list.
  4. Click the button on the left side of the Manual pattern field and select Regex from the list.
    Regex option selected from the Manual pattern field.
  5. In the Manual pattern field, type ^[ISBN]{4}[ ]{0,1}[0-9]{1}[-]{1}[0-9]{3}[-]{1}[0-9]{5}[-]{1}[0-9]{0,1}$.

    This regular expression corresponds to the ISBN number model that you want to identify in your dataset.

  6. Click Submit.

    A new column ISBN_matching is created, where the values that match the pattern defined by the regular expression, are listed a true. The values that do not match are listed as false.

    ISBN and ISBN_matching columns illustrated.

Results

After using a regular expression to search for a specific pattern, you can now easily identify and isolate the values that match your search.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!