Generating test customer data and processing it
Procedure
- Click Connections > Add connection.
-
In the panel that opens, select the type of connection you
want to create.
Example
data generator -
Select your engine
in the Engine list.
Information noteNote:
- It is recommended to use the Remote Engine Gen2 rather than the Cloud Engine for Design for advanced processing of data.
- If no Remote Engine Gen2 has been created from Talend Management Console or if it exists but appears as unavailable which means it is not up and running, you will not be able to select a Connection type in the list nor to save the new connection.
- The list of available connection types depends on the engine you have selected.
-
Select the type of connection you want to create.
Here, select Data generator.
- Click Add dataset and fill in the dataset properties as described in Data generator properties.
-
In the Add a new dataset
panel, name your dataset.
Example
customer generated data -
Fill in the properties to generate the test customer data of
your choice. In this example:
- In the Rows field, type in 100 as you want to generate 100 test records.
- Click Add field, type in firstname in the Name field of the element, select First Name in the Type list and type in 0 in the Blank % field as you want to generate random first names with no empty fields.
- Click Add field, type in lastname in the Name field of the element, select Last Name in the Type list and type in 0 in the Blank % field as you want to generate random last names with no empty fields.
- Click Add field, type in age in the Name field of the element, select Age in the Type list, type in 18 in the Min field and 99 in the Max field and type in 0 in the Blank % field, as you want to generate ages between 18 and 99 with no empty fields.
- Click Add field, type in hair_color in the Name field of the element, select Random within list in the Type list and type in 0 in the Blank % field. Add elements to the random list you want to create, here different hair color values and weight.
- Type in brown in the first Element field and 0.4 in the Weight field, type in blond in the second Element field and 0.2 in the Weight field, and type in red in the third Element field and 0.4 in the Weight field, as you want to generate hair color fields that contain 40% of brown hair, 20% of blond hair and 40% of red hair.
- Click Add field, type in email in the Name field of the element, select Email in the Type list and type in 20 in the Blank % field as you want to generate random emails with 20% of empty values.
- Click Add field, type in phone in the Name field of the element, select Phone number (ext) in the Type list and type in 0 in the Blank % field as you want to generate random phone numbers with no empty values.
- Click Validate to save your dataset. In the dataset detailed view, you can view the generated data that corresponds to the criteria you have defined.
- Add two Test datasets that will be used as destinations in your pipeline. Fill in the connection properties as described in Test connection properties.
- Click Add pipeline on the Pipelines page. Your new pipeline opens.
-
Give the pipeline a meaningful name.
Example
Clean, format & sort customer generated data - Click ADD SOURCE and select your source dataset, customer generated data in the panel that opens.
- Click and add a Field concatenator processor to the pipeline. Give it a meaningful name, concatenate names for example and use the Concatenate with value/another field function to concatenate the firstname and lastname fields together.
-
Click Save to
save your configuration.
All first and last names are now combined with a a space as a separator.
- Click and add a Data cleansing processor to the pipeline. Give it a meaningful name, fill empty emails with N/A for example and use the Fill empty cells with text function to fill the email empty values with the N/A text.
-
Click Save to
save your configuration.
All the empty values in the email fields are now replaced with N/A.
- Click and add a Phones processor to the pipeline. Give it a meaningful name, format customer phones for example and use the Format phone number function to format the generated phone number fields using the correct American standard syntax.
-
Click Save to
save your configuration.
All the phone number values are now formatted.
- Click and add a Filter processor to the pipeline. Give it a meaningful name, sort customers by age for example and use the <= Operator with the 35 value to split the customers based on their age (less or more than 35 years old).
-
Click Save to
save your configuration.
In this preview, 10 records match the criteria (less than 35 years old) you have defined.
-
Click the ADD
DESTINATION item after the Filter processor and select the dataset that will hold the data
that matches the filter criteria.
Rename it if needed.
-
Click the button on the Filter processor and select the dataset
that will hold your rejected data.
Rename it if needed.
- On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
- Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.
Results
Your pipeline is being executed, the 100 generated test fields are being processed and
the output flows are sent to the Test datasets you have defined. You can see in the logs
that the data is split between customers who are less than 35 years old and customers
who are more than 35 years old.
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!