East Asia pattern frequency indicators
- The East Asia Pattern Frequency indicator computes the number of most frequent records for each distinct pattern.
- The East Asia Pattern Low Frequency indicator computes the number of less frequent records for each distinct pattern.
These two indicators work only with Latin characters and are available only with the Java engine. They are useful when you want to identify patterns in Asian data.
The above two indicators give patterns by converting Asian characters to letters such as H,K,C and G following the rules described in the following table:
Character type | Usage |
---|---|
Latin numbers | 9 replaces all ASCII digits |
Latin lowercase letters | a replaces all ASCII Latin characters |
Latin uppercase letters | A replaces all uppercase Latin characters |
Full-width Latin numbers | 9 replaces all ASCII digits |
Full-width Latin lowercase letters | a replaces all ASCII Latin characters |
Full-width Latin uppercase letters | A replaces all uppercase Latin characters |
Hiragana | H replaces all Hiragana characters |
Half-width Katakana | k replaces all half-width Katakana characters |
Full-width Katakana | K replaces all full-width Katakana characters |
Katakana | K replaces all Katakana characters |
Kanji | C replaces Chinese characters |
Hangul | G replaces Hangul characters |
Below is an example of a column analysis using the East Asia Pattern Frequency and East Asia Pattern Low Frequency indicators on an address column.
The analysis results of the East Asia Pattern Low Frequency indicator will look like the following:
These results give the number of the least frequent records for each distinct pattern. Some patterns have characters and numbers, while others have only characters. Patterns also have different lengths, so this shows that the address is not consistent and you may need to correct and clean it.
East Asia pattern frequency indicators and database compatibility
The following table shows the indicators that you can select in any database:
Indicator | Supported data types with the Java analysis engine | Supported data types with the SQL analysis engine |
---|---|---|
East Asia Pattern Frequency |
|
None |
East Asia Pattern Low Frequency |
|
None |