Skip to main content Skip to complementary content

Masking Medicare beneficiary identifiers

Using the tPatternMasking component, you can replace personally identifiable information, such as Medicare Beneficiary Identifiers (MBI), with realistic values in a consistent manner.

This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real-Time Big Data Platform, Talend MDM Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

A MBI uniquely identifies a beneficiary of the US federal health insurance program. This identifier consists of 11 characters, excluding dashes, and uses the following pattern:
  • A digit in the 1 to 9 range.
  • A letter in the A to Z range (minus S, L, O, I, B, Z).
  • A digit or a letter in the A to Z range (minus S, L, O, I, B, Z).
  • A digit in the 0 to 9 range.
  • A letter in the A to Z range (minus S, L, O, I, B, Z).
  • A digit or a letter in the A to Z range (minus S, L, O, I, B, Z).
  • A digit in the 0 to 9 range.
  • A letter in the A to Z range (minus S, L, O, I, B, Z).
  • A letter in the A to Z range (minus S, L, O, I, B, Z).
  • A digit in the 0 to 9 range.
  • A digit in the 0 to 9 range.

For example, 1EG4-TE5-MK73 is a valid MBI.

This scenario describes a Job which uses the following components:
  • The tFixedFlowInput component generates MBIs.
  • The tPatternMasking component replaces the original MBIs with random digits or letters from a set of named values, or a random digit from a specified range.

  • The tLogRow component outputs the substitute dataset.
    A Job using the tFixedFlowInput, tPatternMasking, and tLogRow components.

Setting up the Job

Procedure

  1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tPatternMasking and tLogRow.
  2. Connect the three components together using Row > Main links.

Configuring the input component

Procedure

  1. Double-click tFixedFlowInput to open its Basic settings view in the Component tab.
    Configuration of the tFixedFlowInput component.
  2. Click the [...] button next to Edit schema and use the [+] button in the dialog box to add a column of String type.
    Schema of the tFixedFlowInput component.
  3. Click OK in the dialog box and accept to propagate the changes when prompted.
  4. In the Number of rows field, enter 1.
  5. In the Mode area, select the Use Inline Content(delimited file) option.
  6. In the Content table, enter the data you want to mask.

Configuring the masking operations

The alpha_values.zip file contains the allowed alphabetic values: all letters in the A to Z range (minus S, L, O, I, B, Z). The alphanum_values.zip file contains the allowed alphanumeric values: the values from alpha_values.zip and digits.

Before you begin

Procedure

  1. Double-click tPatternMasking to display its Basic settings view in the Component tab.
    Configuration of the tPatternMasking component.
  2. If required, click Sync columns to retrieve the schema defined in the input component.
  3. Click the Edit schema button to open the schema dialog box.

    tPatternMasking adds a read-only column to the output schema.

    Examples of input and output schemas.

    The ORIGINAL_MARK column labels output records:

    • Original records are labeled with the true label.
    • Substitute records are labeled with the false label.
  4. In the Modifications table, click the [+] button to add ten rows for configuring the data masking operations.
    The first nine rows define the masking operation for each of the first nine characters in the input values. The last row define the masking operation for the last two characters in the input values.
    The dash is used as a separator in the input values. You do not need to configure masking operations for separators because the masked output has the same separators as the input values.
  5. Configure the masking operations for the first, fourth and seventh characters that appear in the input:
    1. Click the Column to mask field of the first row and select the column that contains the data to be masked.
      In this example, select MBI.
    2. From the Field type field, select Interval as the field type the data belongs to and enter the range of authorized numeric values in the Values field.
      In this example, the purpose is to mask the first character with a digit in the 1 to 9 range ("1,9"). The fourth and seventh characters will be masked with a digit in the 0 to 9 range ("0,9").
    3. Apply the same configuration to the fourth and seventh rows of the Modifications table.
  6. Configure the masking operations for the second, fifth, eighth and ninth characters that appear in the input:
    1. Click the Column to mask field of the second row and select the column that contains the data to be masked.
    2. From the Field type field, select Enumeration from file.
    3. Click the Values field and press Ctrl + Space to select the variable for the file that contains the authorized values.
      In this example, select the variable for the file that contains the authorized alphabetic values.
    4. Apply the same configuration to the fifth, eighth and ninth row of the Modifications table.
  7. Configure the masking operations for the third and sixth characters that appear in the input:
    1. Click the Column to mask field of the third row and select the column that contains the data to be masked.
    2. From the Field type field, select Enumeration from file.
    3. Click the Values field and press Ctrl + Space to select the variable for the file that contains the authorized values.
      In this example, select the variable for the file that contains the authorized alphanumeric values.
    4. Apply the same configuration to the sixth row of the Modifications table.
  8. Configure the masking operations for the last two characters that appear in the input:
    1. Click the Column to mask field of the last row and select the column that contains the data to be masked.
    2. From the Field type field, select Interval as the field type the data belongs to and enter "0,99" for the range of authorized numeric values in the Values field.
      In this example, the purpose is to mask the characters with two digits in the 0 to 9 range.
      To mask each of the two characters separately, you can add a row to the Modifications table, define two masking operations and enter "0,9" for the range of authorized numeric values.

Configuring the output component and executing the Job

Procedure

  1. Double-click the tLogRow component to display the Basic settings view and define the component properties.
    Configuration of the tLogRow component.
  2. Click Sync columns button to retrieve the schema defined in the preceding component.
  3. Select Table in the Mode area.
  4. Save your Job and press F6 to execute it.

Results

Results from the Console.

The tPatternMasking component alters the values from the input data and outputs original and substitute records.

The input data has been altered but the output data looks real and consistent. The substitute data is still usable for non-production purposes.

The first record is labeled as true because 1EG4-T-E5-MK73 is an original record. The second record is labeled as false because 7H24-E-K4-HY91 is a substitute record:
  • The first character is replaced with a digit from the 1 to 9 range, as defined in the tPatternMasking properties.
  • The second, fifth, eighth, and ninth characters are replaced with a letter from the list of authorized values defined in the enumeration file.
  • The third and sixth characters are replaced with one of the authorized alphanumeric values defined in the enumeration file.
  • The fourth and seventh characters are replaced with a digit from the 0 to 9 range, as defined in the tPatternMasking properties.
  • The last two characters are replaced with a number from the 0 to 99 range, as defined in the tPatternMasking properties.
  • The input uses dashes as separators and they remain unchanged in the output.

The tPatternMasking component outputs null for 0EF6-T-F4-AC44 because this value is invalid: the first character, 0, is out of the specified range ("1,9").

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!