Skip to main content Skip to complementary content

Altering data values to restrict the use of actual sensitive data

With the tDataMasking component, you can replace sensitive information such as credit card or social security numbers with realistic values, allowing production data to be safely used for purposes such as testing and training.

This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real-Time Big Data Platform, Talend MDM Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

This scenario describes a Job which uses:
  • The tFixedFlowInput component to generate personal data including credit card numbers.
  • The tDataMasking component to hide specific original data with random characters or figures.
  • The tFileOutputExcel component to output the substitute dataset.
A Job using the tFixedFlowInput, tDataMasking, and tFileOutputExcel components.

Setting up the Job

Procedure

  1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tDataMasking and tFileOutputExcel.
  2. Connect the three components together using the Main links.

Configuring the input component

Procedure

  1. Double-click tFixedFlowInput to open its Basic settings view in the Component tab.
    Configuration of the tFixedFlowInput component.
  2. Create the schema through the Edit Schema button.
    Schema of the tFixedFlowInput component.
    In the open dialog box, click the [+] button and add the columns that will hold the initial input data.
  3. Click OK.
  4. In the Number of rows field, enter 1.
  5. In the Mode area, select the Use Inline Content option.
  6. In the Content table, enter the customer data you want to replace with realistic values, for example:
    0|4244487462024688|Nowmer|Sheri|A.|2433 Bailey Road|Tlaxiaco|Oaxaca|15057|Mexico|271-555-9715|SheriNowmer@@Tlaxiaco.org
    1|3458687462024688||Sheri|A.|2433 Bailey Road|Tlaxiaco|Oaxaca|15057|Mexico|271-555-9715|SheriNowmer@Tlaxiaco.org.org
    2|4639587470586299|Whelply|Derrick|I.|2219 Dewing Avenue|Sooke|BC|17172|Canada|211-555-7669|DerrickWhelply@Sooke.org
    3|2541387475757600|Derry|Jeanne||7640 First Ave.|Issaquah|WA|73980|USA|656-555-2272|JeanneDerry@Issaquah.org
    4|7845987500482201|Spence|Michael|J.|337 Tosca Way|Burnaby|BC|74674|Canada|929-555-7279|MichaelSpence@Burnaby.org
    5|1547887514054179|Gutierrez|Maya||8668 Via Neruda|Novato|CA|57355|$$#|387-555-7172|MayaGutierrez@Novato.org
    6|5469887517782449|Damstra|Robert|F.|1619 Stillman Court|Lynnwood|WA|90792|$$#|922-555-5465|RobertDamstra@Lynnwood.org
    7|54896387521172800|Kanagaki|Rebecca||2860 D Mt. Hood Circle|||13343|Mexico|515-555-6247|RebeccaKanagaki@Tlaxiaco.org
    8|47859687539744377||Kim|H.|6064 Brodia Court|San Andres|DF|12942|Mexico|411-555-6825|Kim@Brunner@San Andresorg
    9|35698487544797658||Brenda|C.|7560 Trees Drive||BC|$$|Canada|815-555-3975|BrendaBlumberg@Richmond.org
    10|36521487568712234|Stanz|Darren|M.|1019 Kenwal Rd.|$$#|OR|82017|USA|847-555-5443|DarrenStanz@Lake Oswego.org
    ...

Replacing actual data with realistic values

Procedure

  1. Double-click tDataMasking to display the Basic settings view and define the component properties.
    Configuration of the tDataMasking component.
  2. If required, click Sync columns to retrieve the schema defined in the input component.
  3. Click the Edit schema button to open the schema dialog box.
    tDataMasking proposes one predefined read-only column as shown in the below capture.
    An example of input and output schemas.
    This column identifies by true or false if the output record is an original or a substitute record respectively.
  4. Move any of the input columns to the output schema if you want to show them in the results, click OK and accept to propagate the changes.
  5. In the Modifications table, click the [+] button to add four rows, and perform the following actions:
    • In the Input Column, select the columns which content you want to substitute.
    • In the Category column, select from the list the category to which the masking function you want to use belong.
    • In the Function column, select from the list the function you want to use to mask data.
    • When available, in the Parameter column, select from the list the method to be used by the function to mask data.
    • When available, in the Parameter column, enter a value, a pattern or a path to be used by the function to mask data.
    In this example, the Job will generate inauthentic credit card numbers, replace the first three letters of first names, replace last names with names from a local file and replace the local part in email addresses with X characters.
  6. Click the Advanced settings tab and select the Output the input row check box.
    The Job will add the original data rows to the substitute data.

Configuring the output component and executing the Job

Procedure

  1. Double-click the tFileOutputExcel component to display the Basic settings view and define the component properties.
    Configuration of the tFileOutputExcel component.
  2. Set the destination file name as well as the sheet name and then select the Define all columns auto size check box.
  3. Save your Job and press F6 to execute it.
    The tDataMasking component substitutes data in the selected columns and writes the result in an output file.
  4. Right-click the output component and select Data Viewer to display the original and substituted data.
    Example of the Data Viewer display.
    tDataMasking outputs original and substitute rows marked respectively with true and false in the ORIGINAL_MARK column. It generates inauthentic credit card numbers, replaces the first three letters of first names, replaces last names with names from a local file and finally replaces the part before the @ sign in email addresses by the names defined in the component basic settings.
    Sensitive personal information in the input data has been "hidden" but data keeps looking real and consistent. The substitute data is still usable for purposes other than production.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!