Data quality data mart
The data quality data mart contains the analyses and reports executed in Talend Studio. The data is stored as a star schema, which consists of fact tables and a number of associated dimension tables.
You can use the Physical Data Model (PDM) of Talend Data Quality to create your own specified reports with JasperReports reporting tool and use them when creating user-specified reports in Talend Studio.
You may also connect this data mart to your own reporting tools, such as Tableau Software, and find the data quality information in your own business intelligence environment.
The physical design of Talend Data Quality includes fact and dimension tables.
Fact tables:
- TDQ_INDICATOR_VALUE: indicator value
- TDQ_OVERVIEW_INDVALUE: overview analyses
- TDQ_MATCH_INDVALUE: comparison analyses
- TDQ_SET_INDVALUE: column set analyses
- TDQ_MATCHING_INDVALUE: match analyses
- TDQ_GROUP_STATISTICS: table storing the group statistics of the match analysis
- TDQ_BLOCKING_KEY: table storing the blocking key definition of the match analysis
- TDQ_MATCHING_KEY: table storing the matching key definition of the match analysis
Fact tables may contain columns that have the following values: NULL (TALEND), N/A (TDQ) and EMPTY (TDQ). The NULL (TALEND) value indicates that the analyzed data is null. The N/A (TDQ) value indicates that there is no meaning to have a value in the column in the data quality context. The EMPTY (TDQ) value indicates that the analyzed data is empty (an empty string is different from a null value in most databases).
Dimension tables:
-
TDQ_ANALYSIS: the analysis instance in a report (meaning that the pair of the report and analysis ids forms the functional key).
As dimension tables have data that slowly changes, historical data is tracked by creating multiple records in the dimensional tables with separate keys. New records are inserted each time a change is made. For more information, see Slowly changing dimension.
Dimensional tables may contain columns that have the following values: NULL (TALEND), N/A (TDQ) and EMPTY (TDQ). The NULL (TALEND) value indicates that the analyzed data is null. The N/A (TDQ) value indicates that there is no meaning to have a value in the column in the data quality context. The EMPTY (TDQ) value indicates that the analyzed data is empty (an empty string is different from a null value in most databases).
The figure below shows the physical design of the PDM of Talend Data Quality. It also shows tables inter connectivity.
The three figures that follow draw parts of the PDM concerning the comparison analyses, the overview analyses and the analyses of a set of columns.