Skip to main content Skip to complementary content

tDataprepRun Standard properties for a cloud deployment (Beta)

Availability-noteBeta

These properties are used to configure the cloud version of tDataprepRun running in the Standard Job framework.

The Standard tDataprepRun component belongs to the Talend Data Preparation family.

To use the cloud deployment version of the tDataprepRun component, select Cloud from the Deployment drop-down list of the component's basic settings.

The component in this framework is available in all subscription-based Talend products.

Basic settings

Deployment

From the drop-down list, select your current deployment type, Cloud in this case, and click Apply.

With this setting, your data will not transit in Talend Cloud, and the preparation will be run locally.

Property Type

Either Built-in or Repository.

  • Built-In: You create and store the schema locally for this component only.
  • Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.

Data Preparation server

From the drop-down list, select the data center that corresponds to your region:

  • EU
  • US
  • US-West
  • AP
  • AU
  • Custom (to enter the endpoint manually)

Login

Type the email address that you use to log in the Talend Cloud Data Preparation application.

Password

Click the [...] button and type your user password for the Talend Cloud Data Preparation application, between double quotes.

Preparation identifier

Use the [...] button to select a preparation from the list, or type the id of the preparation you want to use, that you can retrieve from the URL of an open preparation window.

Preparation version

Preparation versions are referenced by their number. As a consequence, to execute the version #2 of a preparation for example, the expected value is 2. To use the current version of the preparation, the expected value is HEAD.

You can also click the [...] button to select a version from the list.

Schema and Edit Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion.

    If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the Repository Content window.

Click Sync columns to retrieve the schema from the previous component connected in the Job.

Guess Schema

Click this button to retrieve the schema from the preparation defined in the Preparation identifier fileld.

References

Read-only table that lists references used by the component, to support join datasets.

Trigger dependencies

Click this button if your preparation includes a lookup function. It will allow the component to retrieve the data from the join dataset.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Use Dictionary

Select this check box if your preparation includes steps that modify the semantic types of the columns or functions that use semantic types. When selected, you need to configure the following parameters:

  • Personal access token: Use the [...] button to enter a personal access token, that you can retrieve in Talend Management Console, between double quotes.
  • Tenant ID: ID of the tenant where the semantic types are stored, usually the same tenant as your Talend Cloud Data Preparation instance. Use the [...] button to select it from a list.
  • Dictionary version: Click the [...] button to select the latest version from the list.
  • Temporary folder: Enter the path where the dictionary information will be downloaded and stored to make the run work. By default, the path of the temporary folder is target/demoTmp.

Usage

Usage rule

This component is an intermediary step. It requires an input flow as well as an output.

Limitations

  • If the dataset is updated after the tDataprepRun component has been configured, the schema needs to be fetched again.
  • The cloud deployment mode will not work when using a Talend JobServer, a Talend Runtime or a Talend Remote Engine.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!