Appending data to the Qlik Associative Big Data Index index cluster

You can append data from the data sources to the Qlik Associative Big Data Index index cluster using the Data Append REST API. The deployment contains shell scripts to simplify start up of the needed services.

When you append data in Qlik Associative Big Data Index it is automatically detected by Qlik Sense Engine Service. This happens within five minutes after the new data is available to the QSL Manager. The new data is applied the next time you do a selection in Qlik Sense.

Setting QSL Manager data update frequency

If you want the QSL Manager to check for new data more frequently, you can edit the idxltUpdateCheckFreqMS setting in qsl_registry_config.json. The default value is 60000 milliseconds (one minute). Do the following:
  1. If QSL services are running, stop QSL services.
  2. Delete qsl_registry_config.json from /output/config/qsl_processor.
  3. Edit /dist/runtime/config/template/qsl_registry_config.json and change the idxltUpdateCheckFreqMS setting.

  4. Restart QSL services.
When QSL services are restarted, an updated copy of qsl_registry_config.json is generated in /output/config/qsl_processor.

Preparing to append data

Before you can perform incremental updates the following services need to be configured and running.

  • Index Maintenance Service

    The Index Maintenance Service is started along with other indexing services when start_indexing_env.sh is executed, and configured when you register the schema. You can check that it is running with the task_manager.sh script.

  • Data append REST gateway

    You also need to start the data append REST gateway before you can append data.

Starting the data append REST gateway.

The Qlik Associative Big Data Index REST API provides a gateway to access the public RPCs of the Index Maintenance Service. Before you can notify the indexing cluster that appended data exists, you need to start the gateway with the start_gateway.sh script.

The start_gateway.sh script does not have any mandatory options. If you execute it without options, it will use the settings from the indexing configuration files. If you specify any options, they will override the configuration file setting. For example, f you do not specify a port when executing the script, 8080 is opened for the REST API.

Example: Basic call

./start_gateway.sh
Short option Long option Description
-h --help Print help for the script
-b --binaryfolder
-u --useip
-p --port Specify the port to start gateway services on. The default setting is 8080.
-c --clusterconfig
-k --killexisting Kill any existing gateway process before starting the service.

Updating the index with appended data

If you want to notify the indexing cluster that there are new parquet files available in the data source, you need to execute two Data Append REST API calls. In the examples here we use curl as a client.

We want to add a parquet file stored as /mnt/efs/<user>/data/factor_1/customer.table/customer.parquet/part-00001-af10af29-14b6-425b-8f0f-9ff241684652-c000.snappy.parquet, using a schema tpch1.

  1. Add the parquet file.

    curl -k -X POST -H "Content-Type: application/json" -d "{\"file_path\":\"/mnt/efs/<user>/data/factor_1/customer.table/customer.parquet/part-00001-af10af29-14b6-425b-8f0f-9ff241684652-c000.snappy.parquet\"}" "https://localhost:8080/v1/bdi/singlefileadd"

    Repeat this call if you have more files available.

  2. Trigger an index update

    curl -k -X POST -H "Content-Type: application/json" -d "{\"schema_name\":\"tpch1\"}" "https://localhost:8080/v1/bdi/triggersingleupdate"

You can add Data Append REST API endpoint calls like these in your data pipeline flow, or execute them manually.

For more information about Data Append REST API, see QABDI REST API for index maintenance.

Did this information help you?

Thanks for letting us know. Is there anything you'd like to tell us about this topic?

Can you tell us why it did not help you and how we can improve it?