Skip to main content

Index the data

The index is the main element of Qlik Big Data Index. Before you can index the data, you need to prepare the data using a schema, and connect to the data with a specific connector.

Indexlets are persisted data and symbol tables representing the big data. They are created during indexing, and are used when a request comes from a client to access the big data.

Traditionally, symbol tables contain distinct field values while data tables contain bit-stuffed references to the symbol tables. This provides a compact data model, but is not suited for calculations with big data tables.

In the indexlet concept of Qlik Big Data Index, symbol tables are created per indexlet table, with the data table represented by indices from symbols to records and vice versa. This bi-directional index makes it possible to navigate from any column value to all rows with that value, as well as from any row to find all the column values for that row. This allows for distributed computing of big data aggregations.

You can use the Qlik Big Data Index management console to configure the indexing cluster and create the index. The management console is available on the 8080 port.

You can also make the management console available on port 80 of your qlik-nginx-ingress-controller node external IP by enabling nginx ingress. Example of an address from kubectl get svc:

tpch-nginx-ingress-controller LoadBalancer 10.100.80.40 aaef3704a8c3a11e999d20a615424c7a-885528821.eu-west-1.elb.amazonaws.com 80:30545/TCP,443:31620/TCP

Preparing data source files

You need to prepare the data source files in Parquet format, and place them on a shared folder that can be accessed across all nodes.

Creating the index using the management console

You can use the Qlik Big Data Index management console to configure the indexing cluster and create the index.

Creating the index using scripts

You can also create the index by executing supplied shell scripts in the cluster.

Appending data to the index cluster

You can append data from the data sources to the index cluster using the Data Append REST API...

Managing the index cluster

Manage cluster settings and indexing services...

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!