Skip to main content Skip to complementary content

Create an Apache Spark connection

To access your data stored on an Apache Spark database, you will need to know the server and database name that you want to connect to, and you must have access credentials. Once you have created a connection to an Apache Spark database, you can select data from the available tables and then load that data into your app or document.

In Qlik Sense, you connect to an Apache Spark database through the Add data dialog or the Data load editor.

In QlikView you connect to an Apache Spark database through the Edit Script dialog.

Setting up the database properties

Database properties that can be configured
Database property Description Required
Spark Server Type The type of Spark server can be Shark Server, Shark Server 2, or Spark Thrift Server. yes
Host The IP address or host name of the Apache Spark server. yes
Port Server port for the Apache Spark database. yes
Database The name of the Apache Spark database. yes

Authenticating the driver

Information noteThe Apache Spark connector does not support NTLM authentication. For that reason, the Windows integrated security option is not available.

You have the following options to authenticate the driver:

  • Apache Spark credentials

  • Azure OAuth

Information noteWhen using Direct Access gateway, Azure OAuth and Microsoft Entra ID authentication require Direct Access gateway 1.6.6 or later.

Apache Spark credentials authentication

Authentication properties that can be configured
Authentication property Description Required
Authentication mechanism Select SQL Server Authentication. Yes
User name User name for the Apache Spark connection. Yes
Password Password for the Apache Spark connection. Yes

Azure OAuth authentication

You can authenticate using OAuth 2.0. You need an authentication pin to authenticate via OAuth:

Do the following:

  1. Under Authentication, select Azure OAuth.

  2. Fill in Tenant ID, Client ID, and Client Secret.

  3. Click the Authenticate button in the Credentials section under Account properties. A new window opens.

  4. Copy the Authentication code. Go back to the connection creation dialog.

  5. Under Complete authentication with the code provided by the source, paste the authentication pin. Click Verify.

Information noteWhen using OAuth authentication, if you edit any of the connection properties after a connection has been authenticated, you must explicitly re-authenticate. This re-authentication will not be initiated automatically. If you do not re-authenticate in this situation, the connection will stop working.
Authentication properties that can be configured
Authentication property Description Required
Authentication Mechanism

Select Azure OAuth.

Information noteWhen using Direct Access gateway, Azure OAuth authentication require Direct Access gateway 1.6.6 or later.
Yes
Tenant ID The Azure AD tenant to use for authentication. It is also referred to as the directory ID. Yes
Client ID The client ID when configuring the Azure AD OAuth authorization server. Yes
Client Secret The client secret when configuring the Azure AD OAuth authorization server. This needs to be inputted every time the connection needs to be re-authenticated. Yes

Apache Spark configuration for OAuth

Your Apache Spark database must be configured to use OAuth. The process is described in Microsoft documentation:

Quickstart: Register an application with the Microsoft identity platform

When configuring the OAuth authorization server, the redirect_uri must be set to: https://connector.qlik.com/auth/oauth/v2.htm

Qlik Sense: Apache Spark authentication properties

Authentication properties that can be configured
Property Description
Mechanism Authentication with user name only, with user name and password, or with no authentication. If the Spark Server Type is Shark Server, you must select No Authentication. If the Spark Server Type is Spark Thrift Server, most configurations require User Name authentication. Selecting User Name and Password or User Name gives you the option to set up Account properties.
Username User name for the Apache Spark connection.
Password Password for the Apache Spark connection.
Name

Name of the Apache Spark connection.

The default name will be used if you do not enter a name.

QlikView: Apache Spark authentication properties

Authentication properties that can be configured
Property Description
Mechanism Authentication with user name only, with user name and password, or with no authentication. If the Spark Server Type is Shark Server, you must select No Authentication. If the Spark Server Type is Spark Thrift Server, most configurations require User Name authentication.
Username User name for the Apache Spark connection.
Password Password for the Apache Spark connection.

Account properties

Credentials

Credentials are used to prove that a user is allowed to access the data in a connection.

There are two types of credentials that can be used when making a connection in Qlik Sense SaaS. If you leave the User defined credentials check box deselected, then only one set of credentials will be used for the connection. These credentials belong to the connection and will be used by anyone who can access it. For example, if the connection is in a shared space, every user in the space will be able to use these credentials. This one-to-one mapping is the default setting.

If you select User defined credentials, then every user who wants to access this connection will need to input their own credentials before selecting tables or loading data. These credentials belong to a user, not a connection. User defined credentials can be saved and used in multiple connections of the same connector type.

In the Data load editor, you can click the Primary key underneath the connection to edit your credentials. In spaces or Data manager, you can edit credentials by right-clicking on the connection and selecting Edit Credentials.

See which authentication type applies on each connector's page.

Account properties that can be configured
Account property Description
User defined credentials Select this check box if you want users that access this connection to have to input their own credentials. Deselect this check box if credentials can be shared with anyone who has access to this connection.
New credentials Drop-down menu item that appears if User defined credentials is selected.
Existing credentials Drop-down menu item that appears if User defined credentials is selected.
User User name for the connection.
Password Password for the connection.
Credentials name Name given to a set of user defined credentials.

Setting SSL options

SSL options that can be configured
Property Description Required
Trusted Certificate The full path to the SSL certificate if it is not stored in the standard system location. Yes, if certificate is not stored in the standard system location.
Allow Self-signed Server Certificate Accept an SSL certificate from the server that is self-signed and not verified by a trusted authority. No
Allow Common Name Host Name Mismatch Allow a mismatch between the SSL certificate's common name and the name provided in Host name field. No
Information noteSSL is enabled by default.

Miscellaneous properties

Miscellaneous properties and options that can be configured
Property Description
Query timeout Amount of time before a data load query times out. Can be set from 30 seconds to 65535 seconds. Default is 30 seconds.

Load optimization settings

Load properties that can be configured
Property Description Required
Max String Length

Maximum length of string fields. This can be set from 256 to 16384 characters. The default value is 4096. Setting this value close to the maximum length may improve load times, as it limits the need to allocate unnecessary resources. If a string is longer than the set value, it will be truncated, and the exceeding characters will not be loaded.

No

Advanced options

Information noteThis section is for advanced users who want to add additional connection parameters that are not displayed above.
Advanced options that can be configured
Option Description Required

Name

Name of the property. You can add additional properties by clicking the plus sign.

No
Value

Value of the property.

No
Thrift Can be set to Binary, SASL, or HTTP. Default = SASL Yes
Information noteWhen you connect to an Apache Spark database with the Data load editor or the Edit Script dialog, Test Connection enables you to test the connection before you attempt to create it.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!