Running a preparation on Google Data Flow
You can chose to set Google Cloud Dataflow as Big Data export runtime for your preparations.
Information noteWarning: This is a beta feature. No support is available for it.
To configure this new runtime instead of the default one, you must perform some Streams Runner and Spark Job Server configuration.
Before you begin
- You have a Google Cloud enterprise account and have created a Google Cloud project.
- You have installed Talend Data Preparation.
- You have installed Streams Runner and Spark Job Server on Linux machines.
- You have created a service account on Google Cloud and downloaded the .json file containing the credentials for this service account. This file must be stored on the same machine where the Spark Job Server was installed. The service account must have the right to run Jobs on Google Cloud Dataflow and access buckets involved in your Jobs in Google Cloud Storage, such as your input and output buckets, as well as the bucket set for tempLocation.
Procedure
Results
When exporting a preparation, the Google Cloud Dataflow runtime will be used instead of the default Big Data runtime, depending on the data input and output. For more information on which runtime will be used according to your input and output, see Export options and runtimes matrix.