Prerequisites
Before you begin to work with Databricks (Cloud Storage) as a target in Qlik Replicate, make sure that the following prerequisites have been met:
General prerequisites
Required driver
When Replicate Server is running on Windows or Linux, download and install Simba Spark ODBC Driver 2.8.2 on the Qlik Replicate Server machine.
Replicate on Linux
When Replicate server is running on Linux, you also need to add the following section to the /etc/odbcinst.ini file:
[Simba Spark ODBC Driver] Description=Amazon Hive ODBC Driver (64-bit) Driver=/opt/simba/spark/lib/64/libsparkodbc_sb64.so
AWS prerequisites
Permissions
The following permissions are required:
- The "Bucket" specified in the Databricks on AWS endpoint's Storage settings must have write access to the specified storage target folder.
- Databricks table permissions: Replicate requires permissions to perform the following operations on Databricks tables: CREATE, DROP, TRUNCATE, DESCRIBE, and ALTER table. ALTER table may also include RENAME table and ADD column.
- In order for Replicate to connect to a Databricks cluster via ODBC, the user specified in the endpoint settings must be granted "Can Attach To" permission.
-
The S3 storage bucket (or the directory under the bucket) must be mounted on the Databricks File System (DBFS).
For information on how to set this up, refer to https://docs.databricks.com/data/data-sources/aws/amazon-s3.html
- You must provide Replicate with a valid security token for access to Databricks.
Microsoft Azure prerequisites
Permissions
- The Azure Data Lake Storage (ADLS) Gen2file system or Blob storage location (whichever you are using) must be accessible from the Qlik Replicate machine.
- The "Storage account" (when using Blob storage) or "Azure Active Directory application ID" (when using ADLS) specified in the Microsoft Azure Databricks endpoint's Storage settings must have write access to the specified Blob/ADLS storage target folder.
- Databricks table permissions: Replicate requires permissions to perform the following operations on Databricks tables: CREATE, DROP, DESCRIBE, and ALTER table. ALTER table may also include RENAME table and ADD column.
- In the Access Control (IAM) settings for the ADLS Gen2 file system, assign the “Storage Blob Data Contributor” role to Replicate (AD App ID). It may take a few minutes for the role to take effect.
- In order for Replicate to connect to a Databricks cluster via JDBC/ODBC, you must have "Can Attach To" permission
General
-
The Blob storage container or ADLS Data Lake Store (according to your selected storage type) must be mounted on the Databricks File System (DBFS).
Information note-
For information on how to set this up with Blob storage, see Azure storage
-
For information on how to set this up with ADLS Gen2 storage, see Azure Data Lake Gen2 storage
-
- You must provide Replicate with a valid security token for access to Databricks.
- When configuring a new cluster with Azure Data Lake Storage (ADLS) Gen2, the line "
spark.hadoop.hive.server2.enable.doAs false
" must be added to the "Spark Config
" section.
Supported blob storage types
The following blob storage types are supported:
- Standard storage with block blobs
- Premium storage with block blobs only
Google Cloud prerequisites
Permissions
The following permissions are required:
-
The Google service account should be granted the following permissions on the bucket:
-
storage.buckets.get
-
storage.objects.get
-
storage.objects.list
-
storage.objects.create
-
storage.objects.delete
-
-
The Google service account should be granted the storage.buckets.list permission. This permission is required in order to browse for a bucket in the endpoint settings.
-
Grant the user account permission to perform the following operations on Databricks tables: CREATE, DROP, DESCRIBE, and ALTER table.
-
To enable Replicate to connect to the Databricks cluster via ODBC, the user account must be granted "Can Attach To" permission.
For more information, see the Databricks online help.
-
You must provide Replicate with a valid security token for Databricks access.
-
To access the storage directories from the Databricks cluster, you need to add a configuration for that storage account and its key. For an explanation of how to do this, see the instructions for accessing a GCS bucket directly in the Databricks online help.