Execution and Big Data Proxy Execution

The Remote Engine Gen2 component is used as follows:

Talend Cloud Pipeline Designer: Live preview, access datasets, execute Pipelines
Talend Cloud Data Inventory: Create connections / datasets, samples
Talend Cloud Data Preparation: Access datasets

The Remote Engine Gen2 is a docker image, so the deployment options include deploying onto a Virtual Machine running docker or (preferably) deploying directly to the container orchestration service of choice. Either way – the process of setting up a Remote Engine Gen2 can (and should) be fully automated by your own DevOps team.

Two options are available for deploying the IPP Server:

Spark local – Pipeline execution on a single machine, no external compute dependencies but no horizontal scaling. This option is found on the IPP Server on the Reference Architecture diagrams.
Deploy on an edge node – that is, a machine with access to a big data cluster such as Databricks and AWS EMR. The actual compute is done on the cluster and Remote Engine Gen2 is a runner that is used to instantiate the process. The machine from which this runner executes is commonly referred to as an Edge Node, because it has the network placement, security permissions and so on required to access a big data cluster. This option is found on the IPP Edge Node on the Reference Architecture diagrams.

Assuming enough Remote Engine tokens are available, you can choose to deploy following one or both patterns, or even multiple instances of each pattern. For example, if two different teams required specific placement of their Remote Engine Gen2 in order to gain access to their sources and targets, each team can have an IPP Server and / or an IPP Edge Node.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here