Word-based patterns
Talend Data Preparation allows you to
analyze the word-based patterns repartition in your data.
The word-based patterns indicators are case sensitive. The following table describes what the patterns that you can find in the profiling area corresponds to:
Pattern | Description |
---|---|
[Word] | Word starting with an uppercase character and consisting of lowercase characters |
[WORD] | Word with uppercase characters |
[word] | Word with lowercase characters |
[Char] | Single uppercase character |
[char] | Single lowercase character |
[Ideogram] | One of the CJK Unified Ideographs |
[IdeogramSeq] | Sequence of ideograms |
[hiraSeq] | Sequence of Japanese Hiragana characters |
[kataSeq] | Sequence of Japanese Katakana characters |
[hangulSeq] | Sequence of Korean Hangul characters |
[digit] | One of the Arabic numerals: 0,1,2,3,4,5,6,7,8,9 |
[number] | Sequence of digits |
The following examples illustrate how certain records would be interpreted in Talend Data Preparation.
String | Pattern |
---|---|
A character is NOT a Word | [Char] [word] [word] [WORD] [char] [Word] |
someWordsINwORDS | [word][Word][WORD][char][WORD] |
Example123@domain.com | [Word][number]@[word].[word] |
anotherExample8@domain.com | [word][Word][digit]@[word].[word] |
袁 花木蘭88 | [Ideogram] [IdeogramSeq][number] |
Latin2中文 | [Word][digit][IdeogramSeq] |
Latin3フランス | [Word][digit][kataSeq] |
Latin4とうきょう | [Word][digit][hiraSeq] |
Latin5나는 한국 사람입니다 | [Word][digit][hangulSeq] |