Text standardization components
tJapaneseNumberNormalize | Normalizes Japanese numbers (kansūji) to regular Arabic numbers. |
tJapaneseTokenize | Splits Japanese text into tokens. |
tJapaneseTransliterate | Converts textual data in Japanese to kana and Latin scripts. |
tStem | Enables to standardize data in columns before matching this data. |
tTransliterate | Converts strings from many languages of the world to a standard set of characters (Universal Coded Character Set, UCS). |