tSqlRow properties for Apache Spark Streaming
These properties are used to configure tSqlRow running in the Spark Streaming Job framework.
The Spark Streaming tSqlRow component belongs to the Processing family.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
Basic settings
Schema and Edit schema |
A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.
Click Edit
schema to make changes to the schema.
Information noteNote: If you
make changes, the schema automatically becomes built-in.
|
SQL context |
Select the query languages you want tSqlRow to use.
|
Query |
Enter your query paying particularly attention to properly sequence the fields in order to match the schema definition. The tSqlRow component uses the label of its input link to name the registered table that stores the datasets from the same input link. For example, if a input link is labeled to row1, this row1 is automatically the name of the table in which you can perform queries. |
Advanced settings
Register UDF jars |
Add the Spark SQL or Hive SQL UDF (user-defined function) jars you want tSqlRow to use. If you do not want to call your UDF using its FQCN (Fully-Qualified Class Name), you must define a function alias for this UDF in the Temporary UDF functions table and use this alias. It is recommended to use the alias approach, as an alias is often more practical to use to call a UDF from the query. Once you add one row to this table, click it to display the [...] button and then click this button to display the jar import wizard. Through this wizard, import the UDF jar files you want to use. |
Temporary UDF functions |
Complete this table to give each imported UDF class a temporary function name to be used in the query in tSqlRow. If you have selected SQL Spark Context from the SQL context list, the UDF output type column is displayed. In this column, you need to select the data type of the output of the Spark SQL UDF to be used. |
Usage
Usage rule |
This component is used as an intermediate step. This component, along with the Spark Streaming component Palette it belongs to, appears only when you are creating a Spark Streaming Job. Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs. |
Spark Connection |
In the Spark
Configuration tab in the Run
view, define the connection to a given Spark cluster for the whole Job. In
addition, since the Job expects its dependent jar files for execution, you must
specify the directory in the file system to which these jar files are
transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |