Assessing data quality

After opening a dataset, you can take a look at several parts of the overview to learn more about its overall quality, its schema, the quality statistics, and semantic types of each columns.

다음 구독 중 하나가 필요합니다:

Qlik Talend Cloud Enterprise
Qlik Talend Cloud Premium
Qlik Cloud Analytics Premium
Qlik Cloud Analytics Enterprise
Qlik Sense Enterprise SaaS

Quality indicators of the dataset

A Qlik Cloud Analytics connection is required to compute the quality and profiling of your datasets. For more information, see Data quality for connection-based datasets

When you open the overview of a dataset that has just been registered, most of the information is grayed out. To calculate the data quality for the first time, click the Compute button. If the quality has already been computed once before, but you want to make sure that the data is up to date, click the Refresh button.

Each compute or refresh in pushdown will induce some costs in your Cloud data warehouse (Snowflake or Databricks). For more information, see Data quality for connection-based datasets.

There are two main sections where the quality is displayed.

The Data quality area, that includes a quality bar with three colors, and their respective percentages:
- Invalid (red): Shows the percentage of values in the sample that are considered invalid.
- Empty or null (black): Indicates the percentage of values in the sample that are empty or null.
- Valid (green): Displays the percentage of valid values in the sample. The percentage does not take empty values into account.
The Schema area that shows the different fields of the dataset, which data type or semantic type has been applied, and a quality bar for each field of the dataset.

For connection-based datasets, if the schema and quality of the dataset fails to be retrieved, check if the connection you have set up in the Qlik Analytics Services hub has the Role field properly filled, or if the role itself grants the necessary permissions on the database table.

Semantic types discovery

Each column of a dataset is automatically assigned a semantic type to better describe its content. Behind the scenes a data discovery operation occurs to determine which type to assign.

You can also create semantic types and manage the values in each semantic type.

For more information, see Managing semantic types.

Sampling modes for compute modes

The sampling mode used for data samples depends on the compute mode selected:

Pull-up mode: A head sample is used, meaning the first rows of the dataset are taken as the sample.
Pushdown mode: A random sample is used, ensuring a more distributed representation of the dataset. This mode is currently supported only for Databricks and Snowflake.

Understanding the sampling mode helps in interpreting the data quality metrics accurately based on the compute mode.

이 페이지가 도움이 되었습니까?

이 페이지 또는 해당 콘텐츠에서 오타, 누락된 단계 또는 기술적 오류와 같은 문제를 발견하면 알려 주십시오!

여기에 피드백을 남겨주십시오.