tWindow properties for Apache Spark Streaming
These properties are used to configure tWindow running in the Spark Streaming Job framework.
The Spark Streaming tWindow component belongs to the Processing family.
The streaming version of this component is available in Talend Real-Time Big Data Platform and in Talend Data Fabric.
Basic settings
Window duration |
Enter, without quotation marks, the duration (in milliseconds) that defines the length of the window to be applied. For example, if the batch size defined in the Spark configuration tab is 2 seconds, a window duration of 6 seconds means that 3 batches are handled each time this window is applied. |
Define the slide duration |
Select the Define the slide duration check box and in the field that is displayed, enter, without quotation marks, the time in milliseconds at the end of which the window is to be applied. For example, if the batch size defined in the Spark configuration tab is 2 seconds, a slide duration of 4 seconds means the window is applied every 4 seconds; and if the window duration is 6 seconds, after two window applications there will be the overlap of one batch. If you leave this check box clear, the slide duration is assumed to be the batch size defined in the Spark configuration tab. Both the window duration and the slide duration must be multiples of the batch size defined in the Spark configuration tab. |
Usage
Usage rule |
This component is used as an intermediate step. This component does not change the data schema but controls the pace of the processing of the micro-batches via the specific window. This component, along with the Spark Streaming component Palette it belongs to, appears only when you are creating a Spark Streaming Job. Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs. |
Spark Connection |
In the Spark
Configuration tab in the Run
view, define the connection to a given Spark cluster for the whole Job. In
addition, since the Job expects its dependent jar files for execution, you must
specify the directory in the file system to which these jar files are
transferred so that Spark can access these files:
This connection is effective on a per-Job basis. |