Character-based patterns
Talend Data Preparation allows you to
analyze the character-based patterns repartition in your data.
Latin characters, as well as Asian characters, split between Hiragana, Katakana, Kanji and Hangul, are represented by the following patterns:
| Character | Pattern |
|---|---|
| Latin numbers | 9 replaces all ASCII digits |
| Latin lowercase letters | a replaces all ASCII Latin characters |
| Latin uppercase letters | A replaces all uppercase Latin characters |
| Hiragana | H replaces all Hiragana characters |
| Katakana | K replaces all Katakana characters |
| Kanji | C replaces Chinese characters |
| Hangul | G replaces Hangul characters |
| Katakana | K replaces all Katakana characters |