Word-based pattern indicators

Word-based pattern indicators include case sensitive and case insensitive indicators.

Word-based pattern indicators count the number of record for each distinct pattern and are available only with the Java engine.

You can use those indicators with the String data type only.

Case-sensitive indicators

Indicator	Purpose
CS Word Pattern Frequency	Evaluates the most frequent word patterns.
CS Word Pattern Low Frequency	Evaluates the least frequent word patterns.

Patterns focus on words and are case sensitive:

Pattern	Description
[Word]	Word starting with an uppercase character and consisting of lowercase characters
[WORD]	Word with uppercase characters
[word]	Word with lowercase characters
[Char]	Single uppercase character
[char]	Single lowercase character
[Ideogram]	One of the CJK Unified Ideographs
[IdeogramSeq]	Sequence of ideograms
[hiraSeq]	Sequence of Japanese Hiragana characters
[kataSeq]	Sequence of Japanese Katakana characters
[hangulSeq]	Sequence of Korean Hangul characters
[digit]	One of the Arabic numerals: 0,1,2,3,4,5,6,7,8,9
[number]	Sequence of digits

When using the CS Word Pattern Frequency and CS Word Pattern Low Frequency indicators, the following strings are replaced with the following patterns:

String	Pattern
A character is NOT a Word	[Char] [word] [word] [WORD] [char] [Word]
someWordsINwORDS	[word][Word][WORD][char][WORD]
Example123@domain.com	[Word][number]@[word].[word]
anotherExample8@domain.com	[word][Word][digit]@[word].[word]
袁花木蘭88	[Ideogram] [IdeogramSeq][number]
Latin2中文	[Word][digit][IdeogramSeq]
Latin3フランス	[Word][digit][kataSeq]
Latin4とうきょう	[Word][digit][hiraSeq]
Latin5나는 한국 사람입니다	[Word][digit][hangulSeq]

Case insensitive indicators

Indicator	Purpose
CI Word Pattern Frequency	Evaluates the most frequent word patterns.
CI Word Pattern Low Frequency	Evaluates the least frequent word patterns.

Patterns focus on words and are case insensitive:

Pattern	Description
[word]	Word with lowercase characters
[char]	Single lowercase character
[Ideogram]	One of the CJK Unified Ideographs
[IdeogramSeq]	Sequence of ideograms
[hiraSeq]	Sequence of Japanese Hiragana characters
[kataSeq]	Sequence of Japanese Katakana characters
[hangulSeq]	Sequence of Korean Hangul characters
[digit]	One of the Arabic numerals: 0,1,2,3,4,5,6,7,8,9
[number]	Sequence of digits
[alnum]	Alphanumeric value consisting of characters and Arabic numerals

When using the CI Word Pattern Frequency and CI Word Pattern Low Frequency indicators, the following strings are replaced with the following patterns:

String	Pattern
A character is NOT a Word	[char] [word] [word] [word] [char] [word]
someWordsINwORDS	[word]
Example123@domain.com	[alnum]@[word].[word]
anotherExample8@domain.com	[alnum]@[word].[word]
袁花木蘭88	[Ideogram] [IdeogramSeq][number]
Latin2中文	[word][digit][IdeogramSeq]
Latin3フランス	[word][digit][kataSeq]
Latin4とうきょう	[word][digit][hiraSeq]
Latin5나는 한국 사람입니다	[word][digit][hangulSeq]

The following table shows the indicators that you can select in any database:

Data type	Number		Text		Date		Others
Analysis engine type	Java	SQL	Java	SQL	Java	SQL	Java	SQL
CS Word Pattern Frequency
CS Word Pattern Low Frequency
CI Word Pattern Frequency
CI Word Pattern Low Frequency

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here