Word-based patterns

Talend Cloud Data Stewardship conducts a word-based pattern profiling and computes the word patterns of the data you load in any of the campaigns. You can then use these patterns to filter tasks according to the content and structure of the data before assigning or resolving the tasks.

Word patterns are case sensitive and are computed only for non numeric fields such as text, boolean and semantic types. The following table lists the word patterns and their description.

Pattern	Description
[Word]	Word starting with an uppercase character and consisting of lowercase characters
[WORD]	Word with uppercase characters
[word]	Word with lowercase characters
[Char]	Single uppercase character
[char]	Single lowercase character
[Ideogram]	One of the CJK Unified Ideographs
[IdeogramSeq]	Sequence of ideograms
[hiraSeq]	Sequence of Japanese Hiragana characters
[kataSeq]	Sequence of Japanese Katakana characters
[hangulSeq]	Sequence of Korean Hangul characters
[digit]	One of the Arabic numerals: 0,1,2,3,4,5,6,7,8,9
[number]	Sequence of digits

The following examples illustrate how certain records would be interpreted in Talend Cloud Data Stewardship.

String	Pattern
A character is NOT a Word	[Char] [word] [word] [WORD] [char] [Word]
someWordsINwORDS	[word][Word][WORD][char][WORD]
Example123@domain.com	[Word][number]@[word].[word]
anotherExample8@domain.com	[word][Word][digit]@[word].[word]
袁花木蘭88	[Ideogram] [IdeogramSeq][number]
Latin2中文	[Word][digit][IdeogramSeq]
Latin3フランス	[Word][digit][kataSeq]
Latin4とうきょう	[Word][digit][hiraSeq]
Latin5나는 한국 사람입니다	[Word][digit][hangulSeq]

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here