Skip to main content Skip to complementary content

Dynamic support for Hadoop distributions in Talend Studio (deprecated)

Availability-noteDeprecated
Information noteDeprecation: Dynamic and built-in distributions are deprecated from Talend 8.0 onwards. Spark Universal is the standard for running big data Jobs in big data platform. For more information, see Running a Job with Spark Universal.

In order to move from a built-in or dynamic distribution (for example Cloudera, Hortonworks, Amazon EMR, Azure Synapse, Databricks, Microsoft HDInsight), Talend highly recommends you to enable Spark Universal distribution for your big data distribution and Jobs.

In Talend Studio, if there is no support for the Big Data Platform you want to use, you can add this distribution yourself to make it available to Talend Studio.

With this dynamic support feature, you are empowered with more agility and flexibility to use a Cloudera or Hortonworks version that was not released the moment your Talend Studio was released, by simply adding this version yourself through several clicks.

Adding the latest Big Data Platform dynamically (Dynamic Distributions) (deprecated)

Availability-noteDeprecated

In Talend Studio, if there is no support for the Big Data Platform you want to use, follow the procedure explained below to add this distribution yourself to make it available to Talend Studio.

In the current Talend Studio version, you can use this procedure to add the Cloudera and the Hortonworks distributions only. This procedure uses Cloudera to demonstrate how to add a dynamic distribution to Talend Studio.

With this dynamic support feature, you are empowered with more agility and flexibility to use a Cloudera or Hortonworks version that was not released the moment your Talend Studio was released, by simply adding this version yourself through several clicks.

The dynamic distributions added this way are generally minor versions of a Talend-certified major release of your distribution. Talend relies on the distribution vendors' compatibility statements to ensure the compatibility of Talend Studio with these minor versions and, by this measure, provides official support for the use cases that can be produced on these minor versions as well as on the Talend-certified versions. For further information about the Talend-certified distribution versions and Talend general support policy about the certified and the compatible versions, see Supported Big Data platform distribution versions for Talend Jobs.
  • On the version list of the distributions, some versions are labeled Builtin. These versions were added by Talend via the Dynamic distribution mechanism and delivered with Talend Studio when it was released. They are certified by Talend, thus officially supported and ready to use.
Information noteNote: For Cloudera distribution, Talend recommends you to use CDP 7.x built-in distributions rather than CDP dynamic distribution. With CDP dynamic distribution, Talend supports the version up to 7.1.8; CDP 7.1.9 Private Cloud Base and 7.2 Public Cloud are not supported. If you want to use CDP 7.1.9 Private Cloud Base, CDP 7.2 Public Cloud, or a later version, Talend recommends you to use the Spark Universal feature. For more information, see Running a Job with Spark Universal.

Procedure

  1. In the Integration perspective, click File > Edit Project properties to open the Project Settings dialog box.
  2. Click General > Dynamic distribution settings to open its view.
  3. From the Distribution drop-down list, select Cloudera.
  4. Set up your local Nexus repository to store the dynamic distribution jar files to be downloaded.
    While not mandatory, this step allows other users or other Talend Studio instances to download the same jar files much faster.
    1. Set up a proxy on your local Nexus repository and link this proxy to the dedicated Talend proxy: https://talend-update.talend.com/nexus/content/groups/dynamicdistribution/.
      The credentials to be used to connect to this Talend proxy are:
      • Username: studio-dl-client
      • Password: studio-dl-client

      When you create your local proxy, you need to define the credentials specific to this local proxy. For an example about how to create a Nexus proxy, see Proxy settings from the Nexus documentation.

    2. Click General > Artifact Proxy Setting to open its view, select the Override default setup check box to activate the Repository field.
    3. In the Repository field, enter the URL of your local proxy and the credentials defined for this proxy.
    4. Click Check Connection to verify its connection status.
  5. Go back to the Dynamic distribution settings view and click the Dynamic distribution setup button to open the dynamic distribution configuration wizard.
  6. Select the Create new dynamic configuration radio button and click Refresh to display, on the Version drop-down list, the Cloudera versions that are available in the connected Cloudera repository.
  7. Select the Cloudera version for which you want to generate the configuration to be used by Talend Studio.
  8. Click Finish.

    Talend Studio starts to retrieve the configuration files for this distribution from the Cloudera repository. This retrieval may take a while.

    Once done, the [Dynamic distribution setup] wizard is automatically closed to bring you back to the Dynamic distribution settings view. The newly generated "dynamic" distribution for the version you previously selected is displayed on the Version list.

  9. You can repeat the operations to add more versions if needs be. Otherwise, click Apply and Close to close the Project settings dialog box.

Results

You can then use this new version the same way as you use the built-in distributions provided along with Talend Studio. You can:
  • Set up the connection to this dynamic distribution in the Repository and reuse this connection in Talend Jobs.

  • Directly use this dynamic distribution in your Jobs. If you build your Job to generate executable files in a zip and need to run the executable files on Windows, do not use the .bat script but use the .ps1 script.

Although you can usually export a Job with its dependencies such as a connection defined in the Repository, the connection to a dynamic distribution cannot be exported the same way. If you need to export such a connection, see Export or import the configuration of a dynamic Big Data platform distribution.

Edit the configuration of a dynamic distribution (deprecated)

Availability-noteDeprecated

Once a dynamic Big Data platform distribution has been added to Talend Studio, you can easily edit its configuration.

This is particularly useful when you need to use a customized distribution for which some custom JAR files are required or when you need to debug a dynamic configuration.

As implied above, the procedure to be explained below is applicable only on a Big Data platform distribution added in such a way as described in Adding the latest Big Data Platform dynamically (Dynamic Distributions).

Before you begin

Make sure to have sufficient knowledge about your distribution to understand the changes you are making to the configuration.

Procedure

  1. In the Integration perspective, click File > Edit project properties to open the Project settings dialog box.
  2. Expand the General node and click Dynamic distribution settings to open its view.
  3. Click the Dynamic distribution setup button to open the dynamic distribution configuration wizard.
  4. Select the Edit an existing dynamic configuration radio button and from the Version drop-down list, select the configuration to be edited.
  5. Click Next to open the list of the configuration module groups of the selected distribution.
  6. In the Distribution name field, enter a new name for the distribution you are customizing, in order to distinguish it from the one generated by Talend Studio.
  7. Select the module you need to edit and click the [...] button next to it to open Module Groups Wizard.
  8. In Module Groups Wizard, use the Add and the Delete button to add or remove the JAR files so as to change the configuration of your distribution.
  9. Once done, click Finish to validate your changes and close Module Groups Wizard.
  10. Click Finish again to close the Dynamic distribution setup wizard.

    This custom distribution appears on the Version drop-down list in the Dynamic distribution settings view in the Project settings dialog box.

  11. Click Apply and then OK to validate the new configuration and close the Project settings dialog box.

Results

Your custom distribution is available in Talend Studio.

Export or import the configuration of a dynamic Big Data platform distribution (deprecated)

Availability-noteDeprecated

Although the configuration of a dynamic Big Data platform distribution cannot be exported or imported along with the Jobs that use it, it can be exported or imported via the Dynamic distribution setup view in the Project settings dialog box.

As implied above, the procedure to be explained below is applicable only on a Big Data platform distribution added in such a way as described in the procedure of adding the latest Big Data platform dynamically.

Procedure

  1. In the Integration perspective, click File > Edit project properties to open the Project settings dialog box.
  2. Expand the General node and click Dynamic distribution settings to open its view.
  3. Click the Dynamic distribution setup button to open the dynamic distribution configuration wizard.
    Option Description

    Edit an existing dynamic configuration

    Use this radio button to export a dynamic distribution.

    Once selecting this radio button, the Version list becomes activated.

    Then select the distribution you want to export from this list and click Next to open the detail view of this distribution. In this view, click the Export configuration button to export the configuration as a JSON file.

    Import dynamic configuration

    Use this radio button to import a dynamic distribution.

    Once selecting this radio button, click the [...] button that become activated.

    Then browse to the JSON file that contains the configuration information of the dynamic distribution you want to import.

    This JSON file to be used is often exported from another Talend Studio instance. If you want to use a manually created JSON file, ensure that the JSON schema you use is the same as the schema of the exported JSON file.

  4. Click Finish again to close the Dynamic distribution setup wizard.
  5. If you are importing a dynamic distribution, the imported distribution appears on the Version drop-down list in the Dynamic distribution settings view in the Project settings dialog box. Then click Apply and then OK to validate the new configuration and close the Project settings dialog box.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!