tModelEncoder properties for Apache Spark Streaming
These properties are used to configure tModelEncoder running in the Spark Streaming Job framework.
The Spark Streaming tModelEncoder component belongs to the Machine Learning family.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
Basic settings
Schema and Edit Schema |
A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields. An output column must be named differently from any of the input columns, because the successive transformations from the input side to the output side take place in the same DataFrame (the Spark term for a schema-based data collection) and thus the output columns are actually added to the same DataFrame alongside the input columns. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:
|
Transformation table |
Complete this table using columns from the input and the output schemas and the feature-processing algorithms to be applied on these columns. The algorithms available in the Transformation column varies depending on the type of the input schema columns to be processed. For further information about the algorithms available for each type of input data, see ML feature-processing algorithms in Talend. |
Usage
Usage rule |
This component is used as an intermediate step. This component, along with the Spark Streaming component Palette it belongs to, appears only when you are creating a Spark Streaming Job. Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs. |
Spark Connection |
In the Spark
Configuration tab in the Run
view, define the connection to a given Spark cluster for the whole Job. In
addition, since the Job expects its dependent jar files for execution, you must
specify the directory in the file system to which these jar files are
transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |