Data Profiling and Data Quality
What is Talend Data Quality?
- The Profiling and Data Explorer perspectives where you can analyze data and browse and query analysis results.
- The Integration perspective where you have access to a set of components and routines dedicated to data quality. This enables you to embed data cleansing capabilities in the data transformation/integration processes.
- From the Integration perspective, you have access to hundreds of components for all data integration needs including many data quality components that are used to cleanse data.
For detailed information about data quality specific components, see Data Quality components.
This feature is not shipped with Talend Studio by default. You need to install it using the Feature Manager. For more information, see Installing features using the Feature Manager.
Core features
Metadata repository
Using Talend data quality, you can connect to data sources to analyze their structure (catalogs, schemas, and tables), and stores the description of their metadata in its metadata repository. You can then use this metadata to set up metrics and indicators.
For more information, see Creating connections to data sources.
One specific feature of interest as well is a report database where you can keep a history of created reports and share results among team members. For more information, see Managing the report database.
Patterns and indicators
- Regular expressions which are predefined regular patterns.
- SQL patterns which are the patterns you add using LIKE clauses.
For more information about patterns, see Patterns.
- System indicators, a list of predefined indicators.
- User-defined indicators, a list of those defined by the user.
For more information about indicators, see Indicators.