Setting global parameters
Global parameters are default placeholders (for example, 'production_date') that can be overwritten with runtime values (for example,'2020-04-09'). If a daily dataflow runs identically with the exception of one variable (for example, the current date) that variable can be updated as needed.
Adding global parameters
To add a global parameter, click on the gear icon upper right of the prepare canvas in the designer tab. Select Add Parameter, enter a name for the key and the value and select OK.
It is a good practice to enclose the parameter value in single quotes when entered and when referenced in the expression builder (e.g.,'$activityDt', '$startDt', '$endDt'). These parameters in Pig script are in the format $<identifier>.
Defining a global parameter
The following example shows a simple global parameter setting. Below, the parameter name 'date_today' is defined with the value '2019-01-30' and can be updated with a new date at any time for new runs.
Once a dataflow includes a source entity and controller, the global parameter value can be entered through the Custom Expression Builder.
Open the Custom Expression Builder to the right of the source field to insert and validate the global parameter.
The defined parameter populates into Parameters tab of the expression builder.
In the Expression field, edit the expression as the defined parameter (ex., HIRE_DATE changed to '$date_today')
The expression populates the IN field.
After saving and validating, execute the dataflow.
Validation will require that the connector is deleted and reconnected. A good practice is to rename the transformed field.
An alert gives the option to Override Global Parameters upon execution.
In this example, the target entity field 'HIRE_DATE_new' should be replaced by the given global parameter value ('2019-01-30') so it is not overridden.
Upon execution, the value defined by the global parameter is populated in every record's 'HIRE_DATE_new' field.
A sample of the transformed Target Entity is shown below.
Setting a global parameter for comparative queries
The following example shows how to embed a dataflow query. For this example we are determining whether 'HIRE_DATE' is between the date '2010-01-01' and '2016-01-01'. The parameter value ('2016-01-01') is first specified by selecting the gear in the upper right of the screen and defining the global parameter for subsequent inclusion in a custom expression. This value can be updated at any time.
The following dataflow is designed with a target entity containing a field 'EMPLOYEE_ID' and a boolean value (TRUE/FALSE) using the Transform package.
The Expression uses the Pig function DateisBetweenInclusive to return a Boolean value indicating whether the hire date occurs between '2010-01-01' and the parameter ('$param') that has been defined (for this run) as '2016-01-01'. Both dates could have been defined as global parameters and these fields can be named anything that the user defines such as 'startdate' and 'enddate' or 'date1' and 'date2'.
Open the expression builder from the field of interest.
Build the expression. Do the following:
- Select DateIsBetweenInclusive UDF from the Function tab;
- HIRE_DATE will have already populated into the field;
- Select '$param' from Parameters tab;
- Adjust syntax, VALIDATE the expression; Select OK
Save, Validate, and Execute the dataflow.
Request displays for user to Click to fix and then ACCEPT the field format change that HIRE_DATE has switched from string to boolean.
Users are given the option to override the defined global parameter. If the global parameter value is not being replaced with runtime data, do not override the set parameter. If runtime data is replacing the default global parameter, check Override Global Parameters. In this case, Override Global Parameters is not checked, the set parameter value is not being replaced. ( Global Parameters tab is behind Partition tab when clicking on the gear icon in the upper right of the canvas.)
The target entity displays true if the employee was hired in the queried time interval and false if their hire date is outside of the date boundaries. Sample data for the created entity is seen by selecting the sample icon above the fields on the target entity tile on the Prepare canvas after execution.
Specifying YARN queue for prepare jobs with global parameters
Users can specify a queue name to submit and run jobs through a global parameter queue specification. A user may want to assign execution of a dataflow to an explicit YARN queue to allocate cluster resources and capacity among users and groups.