Skip to main content

Data Classification Learning Methodology

Talend Data Catalog provides machine learning and a data class inference system centered around learning from the activities you perform, as well as continuing to learn from users accepting and rejecting inferred semantic types, by the following:

  • Automatic Data Classification uses Sample and Profiling data to assign "class" values (former semantic types) to data columns to identify what kind of data these columns contain.
  • You can instruct Talend Data Catalog to classify an object, model, or folder for the first time or again.
  • You can accept or reject inferred data classes or add existing or new classes. You can specify/accept multiple data classes per column.
  • The application remembers your data classification decisions and uses them to improve classification suggestions in the future.

Any Learning algorithm for data classification will have a data-driven origin. Therefore, Talend Data Catalog captures as much information associated with the classes as possible. Given sensitivity, the matching ratio controls the data classification algorithm, which you can adjust with the "learning" index according to the predefined weight.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!