HDFS properties
Properties to configure to connect to a given Hadoop Distributed File
System (HDFS).
HDFS connection
Select HDFS in the list and configure the connection.
Configuration
Select your engine from the list and set the main and advanced settings.
Property | Configuration |
---|---|
User name | User name |
After configuring the connection, give it a display name (mandatory) and a description (optional).
HDFS dataset
Property | Configuration | |
---|---|---|
Dataset name | Enter a display name for the dataset. This name will be used as a unique identifier of the dataset in all Talend Cloud apps. | |
Connection | Select your connection in the list. If you are creating a dataset based on an existing connection, this field is read-only. |
Property | Configuration |
---|---|
Path | Enter the path pointing to the data to be retrieved in the file system. |
Property | Configuration |
---|---|
Auto detect | Click this button to automatically detect the format of the data to be retrieved. |
Format | Alternatively, select in the list the format of the file to be retrieved and
enter or select the information related to this file format:
|
Additional parameters might be displayed depending on whether the connector is used as a
source or destination dataset:
- For HDFS source datasets:
- Force parallelism—ignore escape char and text enclosure parameters: Enable this option if you want to ignore the escape characters and the characters used to enclose the text in your file.
- For HDFS destination datasets:
- Overwrite: Enable this option if the file already exists and you want to overwrite its content.
- Merge output: Enable this option if the file already exists and you want to merge the existing and updated file content.
- Map input column names to output: This option only applies to files with CSV, JSON, or Excel format. It ensures that the input and output field names are identical.