Data quality for file-based datasets
To benefit from semantic types discovery and data quality readings on your file-based datasets, you need to upload your files in your Catalog.
As of now, the supported file types for the quality calculation are CSV, TXT, QVD, XLS and XLSX. If your Excel file contains multiple sheets, the quality calculation will be done on the first sheet only.
Creating file-based datasets
In order for you to create datasets from a file, and later have access to their schema and quality in the dataset overview and data product overview, you need to upload them in Qlik Talend Data Integration.
-
In Qlik Talend Data Integration > Catalog, click Create, and then Dataset.
-
Click Upload data file.
-
Browse to the file you want to upload, select the space in which you want to upload it, and click Upload.
If you click Upload and analyze, both a dataset and an analytics app will be created from this file.
The new dataset is added to the Catalog and you will be able to access quality indicators and more details about their content. This configuration also makes it possible to use the file-based dataset as source for analytics apps.
Since the Catalog can be accessed from both the Qlik Talend Data Integration hub, and Qlik Analytics Services hub, you can open your datasets in your preferred location, and the right connection will be used depending on the context.
Quality compute
Using the Compute or Refresh button on the Overview of your dataset triggers a quality calculation on a sample of 1,000 rows of the database. This operation happens in pullup mode for file-based datasets.
A sample of 100 rows is retrieved, and displayed as a preview with up-to-date semantic types, validity and completeness statistics. This sample is then stored on MongoDB.