Skip to main content Skip to complementary content

Creating the index using the management console

The index is the main element of Qlik Big Data Index, containing indexlets that are persisted data and symbol tables that represent the big data. You can use the Qlik Big Data Index management console to configure the indexing cluster and create the index.

Information noteYou can only work with a single dataset on a deployment using the management console. You need to set up a new deployment to index a new dataset.

Before you create the index, you need to prepare the dataset.

Starting the management console

The management console is available on the 8080 port of the nginx-ingress-controller node.

You can also make the management console available on port 80 of your qlik-nginx-ingress-controller node external IP by enabling nginx ingress. Example of an address from kubectl get svc:

tpch-nginx-ingress-controller LoadBalancer 10.100.80.40 aaef3704a8c3a11e999d20a615424c7a-885528821.eu-west-1.elb.amazonaws.com 80:30545/TCP,443:31620/TCP

Configuring the index settings

Click Configure in the management console to select a dataset and configure the indexing services.

  1. Define the dataset with the following settings:

    Setting Description
    Dataset name Name of the data set that should be processed. The default name is tpch, but you can change it to your preference.
    Source data folder

    Path to the folder where your data set is located..

    This must be set to a shared path that can be accessed across all nodes. The default setting is /home/data.

    Enable cloud input

    You can enable data source input from a cloud shared storage. This requires that you enabled cloud input during deployment.

    Source cloud path Path to the cloud bucket where the data source is located.
    Field mappings file

    Full path to the field mappings file that defines the table and field names to be mapped for attribute to attribute (A2A) associations in the schema.

    Information noteThis setting is optional, and should be left empty if you stored a field mapping file in the source data cloud root path. In this case, the file will be automatically picked up during data sync and used to create associations in the schema scanning process.

    Click Next when you are ready.

  2. Configure indexing services.

    Setting Description
    Output root folder

    Root folder where all output, such as schemas, should be created. This must be set to a shared path that can be accessed across all nodes. The default setting is /home/output.

    Enable cloud output

    You can enable data source output to a cloud shared storage. This requires that you enabled cloud output during deployment.

    Source cloud path Path to the cloud bucket to use for output data (indexlets, symbol tables and symbol positions).
    Symbol server async threads

    The number of parallel threads that the symbol server can handle. The default setting is 1.

    We recommend that you set this to the number of cores.

    Create column index threads

    Setting that affects how much memory is consumed when creating symbols. The default setting is 1.

    We recommend that you set this to a value less than:

    • a third of the memory size in GB of the machine.
    • The value of the symbol_server_async_threads setting.
    Symbol output folder

    Folder for symbol output.

    This setting is optional. If you do not specify a folder, it will be created automatically in the output root folder.

    Symbol positions output folder

    Folder for symbol positions output.

    This setting is optional. If you do not specify a folder, it will be created automatically in the output root folder.

    Index output folder

    Folder for index creation output.

    This setting is optional. If you do not specify a folder, it will be created automatically in the output root folder.

    Logging settings folder

    Folder for storing log files of index creation.

    This setting is optional. If you do not specify a folder, it will be created automatically in the output root folder.

    Click Configure when you are ready.

When you have configured the index settings, you can create the index.

Creating associations

There are a few different ways to add associations in the schema. This must be done before you run Start indexing.

  • We recommend that you save the field mapping structure between tables in a json format file named field_mappings.json and upload it to the source data cloud root path. The file will be automatically picked up during data sync and used in the schema scanning processto generate associations. In this case, leave the parameter field_mappings_file empty in the configuration.

    Field mapping

  • You can also upload the field mapping file to a local folder in the RestAPI pod, and set parameter field_mappings_file to the full path of that file.
  • If you want to add associations manually, use Edit schema after data sync is completed.

Creating the index

Do the following:

  1. Start the indexing cluster services by clicking Start in Indexing services.

    The following services are started on the Indexing manager, Symbol server and Indexer server nodes. If you have several nodes of the same type, for example symbol servers, you will see one instance of the symbol service for each node.

    Information noteDo not start QSL services until you have completed indexing as described in this procedure.
    Indexing services
    Service Default port Node
    Indexing Manager Service 55020 Indexing manager
    Indexing Registry Service

    50057

    Indexing manager
    Persistence Service 55010 Indexing manager
    Index Maintenance Service 55050 Indexing manager
    Symbol Service

    55030 (first node)

    Symbol server (there can be multiple nodes)
    Indexer Service

    55040 (first node)

    Indexer server (there can be multiple nodes)
  2. Scan the data and generate a schema by clicking Start scanning.

  3. Review the schema by clicking Edit schema.

    The data scan generates attribute to attribute (A2A) associations in the schema automatically if:

    • multiple tables have fields with exact same names
    • a field mappings file to define the table/field names to be mapped is set in the settings of the dataset.

    You can add or modify A2A associations before you create the index.

    • The Fields tab defines the table and field structure of the dataset.

      You can add or delete tables and fields.

    • The Associations tab defines the field mapping between tables. Each row defines an association of a pair of fields from two different tables. You can add and remove associations.​

    See Schema configuration sample file.

  4. Click Start syncing to copy source data to all nodes. Wait until sync is complete, when the number of synced files are the same as the number of total files.
  5. Register the schema by clicking Start indexing.

    The index is created when all tasks are completed:

    • Create symbol table
    • Create column index
    • Create associations
  6. Start the QSL cluster services by clicking Start in QSL services.

    The following services are started.

    QSL services
    Service Default port Node
    QSL registry service

    44000

    QSL executor registry
    QSL worker service 55001 (first node) QSL worker (there can be multiple nodes)
    QSL manager service 55000 QSL manager

When QSL services have started, the index is ready to use in a Qlik Sense app.

Troubleshooting

Troubleshooting may be needed when Qlik Big Data Index indexing does not behave as expected (for example, if the system responds with an error message that needs further investigation or does not respond at all when an error occurs).

You can view all service logs under the Logs tab. You can export the log files by clicking the Export button.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!