Standalone
|
-
In the Endpoint
field, enter the URL address of your Azure Databricks workspace.
This URL can be found in the Overview blade
of your Databricks workspace page on your Azure portal. For example,
this URL could look like https://westeurope.azuredatabricks.net.
-
In the Cluster ID
field, enter the ID of the Databricks cluster to be used. This ID is
the value of the
spark.databricks.clusterUsageTags.clusterId
property of your Spark cluster. You can find this property on the
properties list in the Environment tab in the
Spark UI view of your cluster.
You can also easily find this ID from
the URL of your Databricks cluster. It is present immediately after
cluster/ in this URL.
-
Click the [...] button
next to the Token field to enter the
authentication token generated for your Databricks user account. You
can generate or find this token on the User
settings page of your Databricks workspace. For
further information, see Token management from the
Azure documentation.
-
In the DBFS dependencies
folder field, enter the directory that is used to
store your Job related dependencies on Databricks Filesystem at
runtime, putting a slash (/) at the end of this directory. For
example, enter /jars/ to store the dependencies
in a folder named jars. This folder is created
on the fly if it does not exist then.
-
Poll interval when retrieving Job status (in
ms): enter, without the quotation marks, the time
interval (in milliseconds) at the end of which you want the Studio
to ask Spark for the status of your Job. For example, this status
could be Pending or Running.
The default value is 300000, meaning 30
seconds. This interval is recommended by Databricks to correctly
retrieve the Job status.
-
Use
transient cluster: you can select this check box to
leverage the transient Databricks clusters.
The custom properties you defined in the Advanced properties table are automatically taken into account by the transient clusters at runtime.
-
Autoscale: select or clear this check box to define
the number of workers to be used by your transient cluster.
- If you select this check box,
autoscaling is enabled. Then define the minimum number
of workers in Min
workers and the maximum number of
worders in Max
workers. Your transient cluster is
scaled up and down within this scope based on its
workload.
According to the Databricks
documentation, autoscaling works best with
Databricks runtime versions 3.0 or onwards.
- If you clear this check box, autoscaling
is deactivated. Then define the number of workers a
transient cluster is expected to have. This number does
not include the Spark driver node.
-
Node type
and Driver node type:
select the node types for the workers and the Spark driver node.
These types determine the capacity of your nodes and their
pricing by Databricks.
For details about
these node types and the Databricks Units they use, see
Supported Instance
Types from the Databricks documentation.
-
Elastic
disk: select this check box to enable your
transient cluster to automatically scale up its disk space when
its Spark workers are running low on disk space.
For more details about this elastic disk
feature, search for the section about autoscaling local
storage from your Databricks documentation.
-
SSH public
key: if an SSH access has been set up for your
cluster, enter the public key of the generated SSH key pair.
This public key is automatically added to each node of your
transient cluster. If no SSH access has been set up, ignore this
field.
For further information about SSH
access to your cluster, see SSH access to
clusters from the Databricks
documentation.
-
Configure cluster
log: select this check box to define where to
store your Spark logs for a long term. This storage system could
be S3 or DBFS.
-
Do not restart the cluster
when submitting: select this check box to prevent
the Studio restarting the cluster when the Studio is submitting your
Jobs. However, if you make changes in your Jobs, clear this check
box so that the Studio resarts your cluster to take these changes
into account.
|