Skip to main content Skip to complementary content

Data quality for Snowflake datasets

To benefit from semantic types discovery and data quality readings on your Snowflake datasets, you need to set up an important prerequisite with your data connections in the context of data products.

Snowflake connection settings

In order for you to create datasets from Snowflake, and later have access to their schema and quality in the dataset overview and data product overview, you need to set up the same connection in both the Qlik Talend Data Integration hub, and Qlik Analytics Services hub.

Let's say you want to bring data stored in a Snowflake database, add it to your Catalog as datasets, and group them in a data product that you will use for an analytics app.

  1. In Qlik Talend Data Integration, click Add new and then Data connection.

  2. Configure your access to the Snowflake database using the credentials of a user that has WRITE permissions and access to the tables you want to import.

  3. In Qlik Analytics Services, click Add new, and then Data connection.

  4. Configure your access to the same Snowflake database as previously, using the credentials of the same user ideally, or one that has at least the READ permissions on the tables.

  5. In the Role field, you must enter a role that corresponds to an existing role created in the Snowflake database, and that has the following privileges on these objects.

    • USAGE on WAREHOUSE

    • USAGE on DATABASE

    • USAGE on SCHEMA

    • CREATE TABLE on SCHEMA

    • CREATE FUNCTION on SCHEMA

    • CREATE VIEW on SCHEMA

    • SELECT on TABLE

  6. Back on the Qlik Talend Data Integration homepage, click Add new and then Create data project.

  7. Use your Snowflake connection from step 2 as source for your project and start building your pipeline. See Creating a data pipeline for more information.

  8. At any point in your pipeline, select a data task, go to Settings, and then the Catalog tab where you can select the Publish to Catalog check kb ox.

    It means that this version of the dataset will be published to the Catalog when the data project is prepared and run. It's also possible to check this option at the project level.

  9. Run your data project.

After running your data project, the new dataset is added to the Catalog and you will be able to access quality indicators and more details about their content. This configuration also makes it possible to use the Snowflake datasets as source for analytics apps.

You can add as many datasets as necessary before building your data product. Since the Catalog can be accessed from both the Qlik Talend Data Integration hub, and Qlik Analytics Services hub, you can open your datasets in your preferred location, and the right connection will be used depending on the context.

Quality compute in pushdown

Using the Compute or Refresh button on the Overview of your dataset triggers a quality calculation on a sample of 1,000 rows of the database. This operation happens in pushdown, on the Snowflake side.

A sample of 100 rows is then sent back to Qlik Cloud, where you can display it as a preview with up to date semantic types and validity and completeness statistics. This sample is then stored on MongoDB.

The following diagram summarizes the data quality processing operation.

Architecture diagram of the snowflake pushdown

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!