tRuleSurvivorship Standard properties
These properties are used to configure tRuleSurvivorship running in the Standard Job framework.
The Standard tRuleSurvivorship component belongs to the Data Quality family.
The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, and in Talend Data Fabric.
Basic settings
Schema and Edit schema |
A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields. This component provides two read-only columns:
When a survivor record is created, the CONFLICT column does not show the conflicting columns if the conflicts have been resolved by the conflict rules. |
|
Built-In: You create and store the schema locally for this component only. |
|
Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. |
Group identifier |
Select the column whose content indicates the required group identifiers from the input schema. |
Group size |
Select the column whose content indicates the required group size from the input schema. |
Rule package name |
Type in the name of the rule package you want to create with this component. |
Generate rules and survivorship flow |
Once you have defined all of the rules of a rule package or modified some of them with this component, click the icon to generate this rule package into the Survivorship Rules node of Rules Management under Metadata in the Repository of the Integration perspective of your Studio. Information noteNote:
This step is necessary to validate these changes and take them into account at runtime. If the rule package of the same name exists already in the Repository, these changes will overwrite it once validated, otherwise the Repository one takes the priority during execution. Information noteWarning: In a rule package, two rules cannot use the same
name.
|
Rule table |
Complete this table to create a complete survivor validation flow. Basically, each given rule is defined as an execution step, so in the top-down order within this table, these rules form a sequence and thus a flow takes shape. The columns of this table are:
Order: From the list, select the execution order of the
rules you are creating so as to define a survivor validation flow. The types of order may be:
Rule Name: Type in the name of each rule you are creating. This column is only available to the Sequential rules as they define the steps of the survivor validation flow. Do not use special characters in rule names, otherwise the Job may not run correctly. Rule names are case insensitive. Reference column: Select the column you need to apply a given rule on. They are the columns you have defined in the schema of this component. This column is not available to the Multi-target rules as they define only the Target column.
Function: Select the type of validation operation to be
performed on a given Reference column. The available types include:
Value: enter the expression of interest corresponding to the Match regex or the Expression function you have selected in the Function column. Target column: when a step is executed, it validates a record field value from a given Reference column and selects the corresponding value as the best from a given Target column. Select this Target column from the schema columns of this component. Ignore blanks: Select the check boxes which correspond to the names of the columns for which you want the blank value to be ignored. |
Define conflict rule |
Select this check box to be able to create rules to resolve conflicts in the Conflict rule table. |
Conflict rule table |
Complete this table to create rules to resolve conflicts. The columns of this table are: Rule name: Type in the name of each rule you are creating. Do not use special characters in rule names, otherwise the Job may not run correctly. Conflicting column:When a step is executed, it validates a record field value from a given Reference column and selects the corresponding value as the best from a given Conflicting column. Select this Conflicting column from the schema columns of this component.
Function: Select the type of
validation operation to be performed on a given Conflicting
column. The available types include those in the Rule
table and the following ones:
Value: enter the expression of interest corresponding to the Match regex or the Expression function you have selected in the Function column. Reference column: Select the column you need to apply a given conflicting rule on. They are the columns you have defined in the schema of this component. Ignore blanks: Select the check boxes which correspond to the names of the columns for which you want the blank value to be ignored. Disable: Select the check box to disable the corresponding rule. |
Advanced settings
Input data generated with t-Swoosh algorithm | Select this check box if the input data is generated
using the t-Swoosh algorithm by the tMatchGroup component. Otherwise, clear the check box. This check box is available if you have installed the R2021-06 Studio monthly update or a later one provided by Talend. |
Ignore the new master record from tMatchGroup | This check box is displayed when Input data generated with
t-Swoosh algorithm is selected. The new master record is the result of tMatchGroup + t-Swoosh algorithm. It is not from the original input data. If you need this master record, clear the check box. When there is only one record in a group, it is the master record and will not be ignored. Even if the check box is selected. This check box is available if you have installed the R2021-06 Studio monthly update or a later one provided by Talend. |
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at the Job level as well as at each component level. |
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide. |
Usage
Usage rule |
This component is usually used as an intermediate component, and it requires an input component and an output component. As it needs grouped data to process, this component works straightforwardly alongside the components like tMatchGroup as it requires a group identifier column and a group size column. It also requires that the input data are sorted by the group identifier and that the first row of a group contains the group size. When you export a Job using tRuleSurvivorship, you need to select the Export dependencies check box in order to export the generated survivor validation rules together. For further information about how to export a Job, see Talend Studio User Guide. |