tWriteJSONField properties for Apache Spark Streaming
These properties are used to configure tWriteJSONField running in the Spark Streaming Job framework.
The Spark Streaming tWriteJSONField component belongs to the Processing family.
This component is available in Talend Real-Time Big Data Platform and Talend Data Fabric.
Basic settings
Output type |
Select the type of the data to be outputted into the target file. The data is byte arrays if you select byte. |
Editor |
Opens the interface to create the JSON data structure. For more information, see Configuring a JSON Tree. |
Schema and Edit Schema |
A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:
|
|
Built-In: You create and store the schema locally for this component only. |
|
Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. |
Sync columns |
Click to synchronize the output file schema with the input file schema. The Sync function only displays once the Row connection is linked with the output component. |
Group by |
Define the aggregation set, the columns you want to use to regroup the data. Information noteWarning:
Make sure that the data to be grouped is in sequential order. Information noteNote:
If the Group by field is not empty, the order of rows within the group is not guaranteed by Spark. |
Remove root node |
Select this check box to remove the root node from the JSON field generated. |
Usage
Usage rule |
This component is used as an intermediate step. This component, along with the Spark Streaming component Palette it belongs to, appears only when you are creating a Spark Streaming Job. Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs. |
Spark Connection |
In the Spark
Configuration tab in the Run
view, define the connection to a given Spark cluster for the whole Job. In
addition, since the Job expects its dependent jar files for execution, you must
specify the directory in the file system to which these jar files are
transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |