List of functions
These tables list all the functions available in Talend Data Preparation and their effects.
Applying any of the functions listed below, with the exception of Math functions, will return the input value if the data is considered invalid or if the data type is incompatible with the selected function, no matter if the Create new column check box is selected or not.
In the case of Math functions, empty values are ignored, while invalid values will lead to an empty result when the function is applied.
The following functions, that rely on matching algorithms, do not support Asian characters:
- Standardize value (fuzzy matching)
- Find and group similar text
Functions that can be applied on rows
Name | Category | Description |
---|---|---|
Delete row | data cleansing | Deletes this row |
Make as header | data cleansing | Cells of this row will become columns names, the row will be deleted |
Functions that can be applied on columns
Name | Category | Description |
---|---|---|
Negate value | boolean | Reverse the boolean value of cells from this column |
Change data type | column metadata | Change type of this column (number, text, date, etc.) |
Change semantic domain | column metadata | Change semantic domain of this column (city, zipcode, last name, etc.) |
Create new column | column metadata | Copy a column or create a brand new one Information noteNote: When used in Sequence
mode, this function is not compatible with Spark Jobs,
and HDFS or S3 exports.
|
Delete column | column metadata | Delete the selected columns |
Duplicate column | column metadata | Create an exact copy of this column |
Rename column | column metadata | Rename this column |
Concatenate columns | columns | Merge the content of two or more columns into a new one |
Concatenate with | columns | Merge the content of this column with another one, and displays it in a new column |
Reorder columns | columns | Change column order |
Swap columns | columns | Swap the values with an other column |
Convert country names and codes | conversions | Convert a country name to an ISO-3166 country code (alpha-2, alpha-3 or numeric) or vice versa. This function supports English, French and Japanese names and codes. |
Convert distance | conversions | Convert distance from one unit to another |
Convert duration | conversions | Converts duration from one unit to another |
Convert temperature | conversions | Converts temperature measurement units |
Clear on matching value | data cleansing | Clears cells that match the value |
Clear the cells with invalid values | data cleansing | Clears cells that contain a value recognized as invalid |
Coalesce columns | data cleansing | Retrieves the first not null value for each row among the selected columns |
Delete empty rows | data cleansing | Removes rows where the cell in each column is empty |
Delete the rows that match | data cleansing | Deletes rows where a cell in this column has a specific value |
Delete the rows with empty cell | data cleansing | Deletes rows that have empty cells |
Delete the rows with invalid cell | data cleansing | Deletes rows which contain an invalid cell |
Delete the rows with negative values | data cleansing | Rows with a negative value in this column will be deleted |
Fill cells with value | data cleansing | Fills cells from this column with a given value |
Fill empty cells from above | data cleansing | Copies the value from the nearest non-empty cell
above. When there is a series of empty cells, they all fill with the same
value that was originally picked from above. Information noteNote: This function is not compatible with Spark Jobs, and
HDFS or S3 exports.
|
Fill empty cells with text | data cleansing | Fills empty cells from this column with a given value |
Fill empty cells with value | data cleansing | Fills cells from this column with a given value |
Standardize value (fuzzy matching) | data cleansing | Replaces invalid values with the closest correct value
from a dictionary Information noteNote: This function does not support Asian
characters.
|
Hash data | data masking | Hashes the content of a column using the SHA-256 algorithm. |
Mask data (obfuscation) | data masking | Masks data according to the domain information of the column (anonymisation) |
Lookup | data blending | Blends columns from another dataset into this one |
Calculate time since | dates | Calculates elapsed time since a date in the desired unit (year, month, day, hour, minute, second) |
Calculate timestamp to date | dates | Given a timestamp (ellapsed time in second since Unix time, that is to say 01/01/1970), create a new column with the date |
Change date format | dates | Changes the date format to use in a date column |
Compare dates | dates | Compares this column to another column or a constant |
Convert dates | dates | Converts dates from one calendar to another |
Extract date parts | dates | Creates columns with year, month, quarter, day, hour, minute, second, etc. |
Modify dates | dates | Adds or substract time unit amount |
Deduplicate rows with identical values | deduplication | Deletes rows that are partly or entirely duplicate and
keep the first one of the columns Information noteNote: This function is not compatible with Spark Jobs, and
HDFS or S3 exports.
|
Delete these filtered rows | filtered | Deletes only the rows that match the current filters |
Keep these filtered rows | filtered | Keeps only the rows that match the current filters |
Add, multiply, substract or divide | math | Performs an operation/calculation on this column with another one or with a fixed value: Add/sum (+), multiply (x), subtract (-), or divide(/) |
Base 10 Logarithm | math | Computes the base 10 logarithm from a column |
Calculate absolute value | math | Calculates the absolute value for all the numeric values in this column. |
Cosine | math | Computes the trigonometric Cosine from a column |
Exponential | math | Exponential of a column number |
Max | math | Max with another column or a constant |
Max of columns | math | Displays the max value between the records of two or more columns |
Mean of columns | math | Calculates the mean value of the records of two or more columns |
Median of columns | math | Calculates the median value of the records of two or more columns |
Min | math | Min with another column or a constant |
Min of columns | math | Displays the min value between the records of two or more columns |
Modulo | math | Calculates the remainder of a division operation |
Multiply columns | math | Multiplies the values of two or more columns and display the result in a new column |
Natural logarithm | math | Computes the natural logarithm from a column |
Negate | math | Negates a column number |
Power | math | Powers with another column or a constant |
Sine | math | Computes the trigonometric Sine from a column |
Square root | math | Squares the root of a column number |
Sum columns | math | Caculates the sum of the records of two or more columns and display the result in a new column |
Tangent | math | Computes the trigonometric Tangent from a column |
Compare numbers | numbers | Compares this column to another column or a constant |
Format numbers | numbers | Allows to format number (decimal, integer & scientific) in a specific format or pattern |
Generate sequence | numbers | Creates a sequence of positive or negative integers in
a column, with the step of your choice Information noteNote: This function
is not compatible with Spark Jobs, and HDFS or S3
exports.
|
Remove fractional part | numbers | Rounds towards zero. (3.74 -> 3) and (-3.74 -> -3) |
Remove negative values | numbers | Replaces negative integers or decimal values with an empty values. |
Round value using ceil mode | numbers | Rounds up value to the nearest number, depending on the precision you set. (3.14 -> 4 if Precision is set to 0, and 3.14 -> 3.2 if Precision is set to 1) |
Round value using down mode | numbers | Rounds towards zero. (3.74 -> 3 and -3.74 -> -3 for a Precision set to 0) |
Round value using floor mode | numbers | Rounds down value to the nearest number, depending on the precision you set. (3.74 -> 3 if Precision is set to 0, and 3.74 -> 3.7 if Precision is set to 1) |
Round value using halfUp mode | numbers | Rounds value to the closest number, depending on the precision you set. (3.14 -> 3 and 3.74 -> 4 for a Precision set to 0) |
Extract phone number information | phones | Extracts additional information from phone numbers, such as phone type, country or carrier name. Each field is extracted in a new column. |
Format phone number | phones | Formats a phone number to standard formats |
Extract email parts | split | Extracts local and domain parts from an email |
Extract full name parts | split | Extracts information from a full name, including the title, first name, nickname, middle name, last name and suffix |
Extract number | split | Extracts number from the input |
Extract values by regex | split | Extracts string tokens based on regex groups |
Extract values by semantic type | split | Extracts different information in separated columns according to predefined or custom semantic types |
Extract URL parts | split | Extracts protocol, host, port, query, etc... from an URL in separated columns |
Split the text in parts | split | Splits column from separators |
Calculate length | strings | Extracts the number of digits from a value (23562 -> 5) |
Change to lower case | strings | Converts all of the cell text in this column to lower case |
Change to title case | strings | Converts the text content from this column to title case (i.e. "this is an example" -> "This Is an Example") |
Change to upper case | strings | Converts all of the cell text in this column to UPPER case (capitalize) |
Contains text | strings | Checks if the cell contains the specified value |
Extract a value by index | strings | Extracts part of the text (substring) to a new column |
Magic fill | strings | Allows you to define a pattern via examples and apply
the transformation to the whole column Information noteNote: This function is not compatible with Spark Jobs, and
HDFS or S3 exports.
|
Match similar text | strings | Creates a new column with true or false depending on whether the value is less than or equals the Levenshtein distance of a given value |
Matches pattern | strings | Creates a new column with true or false depending on whether the value matches a given pattern |
Remove consecutive characters | strings | Removes consecutive repeated characters |
Remove part of the text | strings | Removes specified text from cells in this column |
Remove trailing and leading characters | strings | Removes trailing and leading spaces or other specified characters (i.e. trim) |
Search and replace | strings | Replaces cells or parts of cells which contain a specific value |
Add extra characters | strings advanced | Adds extra characters (padding) on the left or on the right of the original value to match an expected size |
Convert character width | strings advanced | Converts the character width to half or full width, or normalize strings |
Find and group similar text | strings advanced | Replaces all similar values with the right one (i.e.
cluster on fuzzy matching) Information noteNote: This function does not support Asian
characters.
|
Generate unique identifier (UUID) | strings advanced | Generates a unique identifier for each row of the column |
Remove all non alpha numeric characters | strings advanced | Cleans values that contain unexpected characters by
removing all characters that are not 0-9, Aa-Zz, and accented letters.
Whitespaces are kept. For example, T=+Âl**€en#d$ 20%1,7 will become TÂlend 2017. |
Remove all non numeric characters | strings advanced | Cleans numeric values by removing all characters that
are not 0-9. Decimal separators and whitespaces are kept. For example, T=+Âl**€en#d$ 20%1,7 will become 201,7. |
Simplify text (remove case, accent, etc.) | strings advanced | Simplifies the content of the column by putting it in
lower case, removing accents, normalizing using NFKD, splitting combined
characters and trimming the whole string. For example, François becomes francois. |
Functions that can be applied on the whole table
Name | Category | Description |
---|---|---|
Delete empty columns | data cleansing | Deletes all columns that contains only empty values |
Delete empty rows | data cleansing | Removes rows where the cell in each column is empty |
Change date format | dates | Changes the date format to use in a data column |
Remove duplicate rows | deduplication | Deletes all rows that are exact duplicate and keep
only one Information noteNote: This function is not compatible with Spark Jobs, and
HDFS or S3 exports.
|
Format numbers | numbers | Allows to format number (decimal, integer & scientific) in a specific format or pattern |
Format phone number | phones | Formats a phone number to standard formats |
Remove trailing and leading characters | strings | Removes trailing and leading spaces or other specified characters (i.e. trim) |
Search and replace | strings | Replaces cells or parts of cells which contain a specific value |