Working with Apache Parquet files

Apache Parquet is a columnar storage format, highly efficient for storing and querying large datasets. In Qlik Sense, you can read data from Parquet files, and store tables as Parquet files.

Parquet allows for efficient querying of specific columns in a table rather than reading the entire table. This makes it well-suited for use with big data processing. Also, Parquet supports efficient compression and encoding of data. This can further reduce storage space and improve query performance.

All existing apps created in a Qlik Sense version before August 2023 must be manually updated to enable Parquet support. This is required both for deployments that are upgraded to August 2023, and when importing existing apps to a new deployment. For more information about updating the apps, see Enable Parquet file support for existing apps in Qlik Sense.

Creating Parquet files

You can create Parquet files using the Store command in the script. State in the script that a previously-read table, or part thereof, is to be exported to an explicitly-named file at a location of your choice. You can nest the data your store in data in Parquet files.

For more information, see Store.

Reading data from Parquet files

You can read data from a Parquet file just like any other data file supported by Qlik Sense. This includes Data manager, Data load editor, or when you add data to a new app.

For more information, see Laden von Daten aus Dateien.

You can also load data from a Parquet file in the data load script with the LOAD command. For example:

LOAD * from xyz.parquet (parquet);

For more information, see Load.

Loading data from nested Parquet files

If a Parquet file contains nested data, it needs to be loaded in multiple load statements, each specifying which subset should be loaded into each table. The Table is specifier is used to prove a path to the group node in the schema to be loaded.

Only nodes that match the Table is path are loaded.

Data node are loaded into a file without nesting.

Group nodes will group the fields in the group by adding their name to the field name. For example, a group with field1 and field2 are loaded as group.field1 and group.field2.

List nodes will generate key fields that are used to link the tables. For example, %Key_group.list. Any group or data node inside the list need to be loaded in a separate load statement. A key field to the parent list will also be added.

The following examples shows the same nested Parquet file, created in the example in Storing nested data in Parquet files, loaded into an app, using Datenmanager and Dateneditor (with the default scripting from Select data and custom scripting).

Beispiel: Datenmanager

If you load customer.parquet in Datenmanager and apply all recommended associations, you end up with the following data model:

Data model showing the tables company:salesrep.salesrep, company, company:headquarter.headquarter, and company:headquarter.headquarter.city:region.region — Data model for loading company.parquet with Datenmanager

Beispiel: Dateneditor (Select data)

If you load the data using Select data in Dateneditor, you end up with the following script:

LOAD company, contact, "%Key_company:headquarter", "%Key_company:salesrep" FROM [lib://AttachedFiles/company.parquet] (parquet); LOAD country, city, "%Key_city:region", "%Key_company:headquarter" FROM [lib://AttachedFiles/company.parquet] (parquet, table is [company:headquarter.headquarter]); LOAD region, "%Key_city:region" FROM [lib://AttachedFiles/company.parquet] (parquet, table is [company:headquarter.headquarter.city:region.region]); LOAD salesrep, "%Key_company:salesrep" FROM [lib://AttachedFiles/company.parquet] (parquet, table is [company:salesrep.salesrep]);

The data model looks like this in Datenmodellansicht.

Data model showing the tables salesrep, company, headquarter, and region. — Data model for loading company.parquet with Select data in Dateneditor

Beispiel: Dateneditor (Custom load script)

If you use a custom load script, you have more control over how the fields and tables are loaded from customer.parquet. The following load script loads the tables and fields from company.parquet:

LOAD * FROM [lib://AttachedFiles/company.parquet] (parquet); LOAD *, Lookup('company', '%Key_company:salesrepo', [%Key_company:salesrep], 'company') as company; LOAD * FROM [lib://AttachedFiles/company.parquet] (parquet, table is [company:salesrep.salesrep]); DROP FIELD [%Key_company:salesrep]; LOAD *, Lookup('company', '%Key_company:headquarter', [%Key_company:headquarter], 'company') as company; LOAD * FROM [lib://AttachedFiles/company.parquet] (parquet, table is [company:headquarter.headquarter]); DROP FIELD [%Key_company:headquarter]; LOAD *, Lookup('city', '%Key_city:region', [%Key_city:region], 'headquarter') as city; LOAD * FROM [lib://AttachedFiles/company.parquet] (parquet, table is [company:headquarter.headquarter.city:region.region]); DROP FIELD [%Key_city:region];

This results in the following data model, which is identical to the original data model before the data was stored in the Parquet file.

Data model with the tables headquarter, region, salesrep, and company, mirroring the exact data model from the source app. — Data model for loading company.parquet with a custom script in Dateneditor

Limitations

Parquet files have the following limitations:

Parquet files that contain an int96 timestamp field may not be loaded correctly.

Int96 is a deprecated data type that contains a timestamp without timezone information. An attempt will be made to read the field as UTC, but as there are different vendor implementations there is no guarantee for success.

Verify the loaded data and adjust it to the correct timezone with an offset if required.

Hat diese Seite Ihnen geholfen?

Wenn Sie Probleme mit dieser Seite oder ihren Inhalten feststellen – einen Tippfehler, einen fehlenden Schritt oder einen technischen Fehler –, teilen Sie uns bitte mit, wie wir uns verbessern können!

Geben Sie hier Ihr Feedback ab