Regular Expressions
Regular expressions (or regex) are advanced search strings that allows you to match complex patterns.
In this documentation, the regular expression elements are classified by category.
All the examples listed are used with the two following lines:
Comment from happy_user@company.com (04-Apr-2016):
I love working with Talend Data Preparation! It really helps me with all my daily tasks!
Regular Expressions Examples
Regular Expression | Matches |
---|---|
\bTa | Talend |
\bw\w* | working, with |
\w+n\b | Preparation |
Talend\s\w+\s\w+ | Talend Data Preparation |
task(s?) | tasks (it would also match "task") |
\w+@\w+.com | happy_user@company.com |
\d{2}-.*-\d+ | 04-Apr-2016 |
Anchors
Character | Matches | Example |
---|---|---|
^ | Start of string, or start of line in a multi-line pattern |
^Comment matches
"Comment" at the beginning of the line. ^C.* matches the first line. |
$ | End of string, or end of line in a multi-line pattern | !$ matches the last exclamation mark. |
\b | Word boundary |
\bwo matches the
"wo" in "working". \bwo\w+ matches "working". ng\b matches the "ng" in "working". \w+ng\b matches "working". |
\B | Not word boundary |
\Bh matches the
final "h" in "with" but not the "h" in "helps" or "happy". h\B matches the first "h" in "helps" and "happy" but not the final one in "with". |
Character Classes
Character | Matches | Example |
---|---|---|
. | Any character, except new line (\n) | . matches all the characters in the text, except for the carriage return. |
\s | White space |
Talend\sData
matches "Talend Data". Data\s+Preparation matches "Data Preparation". |
\S | Not white space | \S matches all the characters in the sentence, except for the spaces. |
\d | Digit | \d{4} matches "2016". |
\D | Not digit | \D matches all the characters in the text but not the numbers. |
\w | Word character and underscore | T\w+matches "Talend". |
\W | Not word | company\Wcom matches "company.com". |
\n | New line | .*\n.* matches the whole text. |
Escape Characters
Character | Matches |
---|---|
\. | . |
\\ | \ |
\+ | + |
\* | * |
\? | ? |
\$ | $ |
\[ | [ |
\] | ] |
\{ | { |
\} | } |
\( | ( |
\) | ) |
\| | | |
\/ | / |
Groups and Ranges
Character |
Matches |
Example |
---|---|---|
() | Group | m(e|y) matches "me" and "my". |
(a|b) | a or b | m(e|y) matches "me" (in "Comment"), "me" and "my". |
[abc] | Range (a or b or c) | m[ey] matches "me" (in "Comment"), "me" and "my". |
[a-q] | Letter from a to q | m[a-m] matches "me" (in "Comment") and "me" but not "my". |
[0-7] | Digit from 0 to 7 | 201[0-5] does not match "2016" but would match all years between "2010" and "2015". |
The expression captured in a group can be reused using the $ symbol. When more than one group is captured, add a number to the $ symbol, so that it corresponds to the order in which they were captured.
For example, you want to reformulate the expression Y16Q02 that can be matched by the regular expression Y(\d{2})Q(\d{2}). You can then reformulate your original expression only keeping the characters you have captured. If you want your new expression to be Quarter 02 of year 2016, the new regular expression Quarter $2 of year 20$1 will match it.
Quantifiers
Character | Matches | Examples |
---|---|---|
* | 0 or more | work\w* matches "working" but also "work" and "works". |
+ | 1 or more | work\w+ matches "working" but also "works". However, it does not match "work". |
? | 0 or 1 | work(s?) matches "work" and "works" but not "working". |
{3} | Exactly 3 | 20\d{2} matches "2016" and other numbers between "2000" and "2099". |
{3,} | 3 or more | 20\d{2,} matches "2016" and all numbers superiors or equal to "2000" starting by "20". |
{3,5} | 3, 4 or 5 | 20{1,2} matches "2016" and all numbers from "200" to "2099". |
[0-7] | Digit from 0 to 7 | 201[0-9] matches "2016" and all numbers from "2010" to "2019". |