Skip to main content Skip to complementary content

Prerequisites

Before you begin to work with Databricks (Cloud Storage) as a target in Qlik Replicate, make sure that the following prerequisites have been met:

General prerequisites

Required driver

When Replicate Server is running on Windows or Linux, download and install Simba Spark ODBC Driver 2.6.22 on the Qlik Replicate Server machine.

Replicate on Linux

When Replicate server is running on Linux, you also need to add the following section to the /etc/odbcinst.ini file:

[Simba Spark ODBC Driver]
Description=Amazon Hive ODBC Driver (64-bit)
Driver=/opt/simba/spark/lib/64/libsparkodbc_sb64.so

AWS prerequisites

Permissions

The following permissions are required:

  • The "Bucket" specified in the Databricks on AWS endpoint's Storage settings must have write access to the specified storage target folder.
  • Databricks table permissions: Replicate requires permissions to perform the following operations on Databricks tables: CREATE, DROP, TRUNCATE, DESCRIBE, and ALTER table. ALTER table may also include RENAME table and ADD column.
  • In order for Replicate to connect to a Databricks cluster via ODBC, the user specified in the endpoint settings must be granted "Can Attach To" permission.
  • The S3 storage bucket (or the directory under the bucket) must be mounted on the Databricks File System (DBFS).

    For information on how to set this up, refer to https://docs.databricks.com/data/data-sources/aws/amazon-s3.html

  • You must provide Replicate with a valid security token for access to Databricks.

Microsoft Azure prerequisites

Permissions

  • The Azure Data Lake Storage (ADLS) Gen2file system or Blob storage location (whichever you are using) must be accessible from the Qlik Replicate machine.
  • The "Storage account" (when using Blob storage) or "Azure Active Directory application ID" (when using ADLS) specified in the Microsoft Azure Databricks endpoint's Storage settings must have write access to the specified Blob/ADLS storage target folder.
  • Databricks table permissions: Replicate requires permissions to perform the following operations on Databricks tables: CREATE, DROP, DESCRIBE, and ALTER table. ALTER table may also include RENAME table and ADD column.
  • In the Access Control (IAM) settings for the ADLS Gen2 file system, assign the “Storage Blob Data Contributor” role to Replicate (AD App ID). It may take a few minutes for the role to take effect.
  • In order for Replicate to connect to a Databricks cluster via JDBC/ODBC, you must have "Can Attach To" permission

General

  • The Blob storage container or ADLS Data Lake Store (according to your selected storage type) must be mounted on the Databricks File System (DBFS).

    Information note
  • You must provide Replicate with a valid security token for access to Databricks.
  • When configuring a new cluster with Azure Data Lake Storage (ADLS) Gen2, the line "spark.hadoop.hive.server2.enable.doAs false" must be added to the "Spark Config" section.

Supported blob storage types

The following blob storage types are supported:

  • Standard storage with block blobs
  • Premium storage with block blobs only

Google Cloud prerequisites

Permissions

The following permissions are required:

  • The Google service account should be granted the following permissions on the bucket:

    • storage.buckets.get

    • storage.objects.get

    • storage.objects.list

    • storage.objects.create

    • storage.objects.delete

  • The Google service account should be granted the storage.buckets.list permission. This permission is required in order to browse for a bucket in the endpoint settings.

  • Grant the user account permission to perform the following operations on Databricks tables: CREATE, DROP, DESCRIBE, and ALTER table.

  • To enable Replicate to connect to the Databricks cluster via ODBC, the user account must be granted "Can Attach To" permission.

    For more information, see the Databricks online help.

  • You must provide Replicate with a valid security token for Databricks access.

  • To access the storage directories from the Databricks cluster, you need to add a configuration for that storage account and its key. For an explanation of how to do this, see the instructions for accessing a GCS bucket directly in the Databricks online help.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!