Skip to main content Skip to complementary content

HDFS properties

Properties to configure to connect to a given Hadoop Distributed File System (HDFS).

HDFS connection

Select HDFS in the list and configure the connection.

Configuration

Select your engine from the list and set the main and advanced settings.

Connection settings
Property Configuration
User name User name

After configuring the connection, give it a display name (mandatory) and a description (optional).

HDFS dataset

Dataset configuration
Property Configuration
Dataset name Enter a display name for the dataset. This name will be used as a unique identifier of the dataset in all Talend Cloud apps.
Connection Select your connection in the list. If you are creating a dataset based on an existing connection, this field is read-only.
HDFS data settings
Property Configuration
Path Enter the path pointing to the data to be retrieved in the file system.
Format configuration
Property Configuration
Auto detect Click this button to automatically detect the format of the data to be retrieved.
Format Alternatively, select in the list the format of the file to be retrieved and enter or select the information related to this file format:
  • CSV:
    • Record delimiter: Select the type of record separator used in the file to be retrieved. If you select Other, you will be able to enter a custom record delimiter in the Custom record delimiter field.
    • Field delimiter: Select the type of field separator used in the file to be retrieved. If you select Other, you will be able to enter a custom record delimiter in the Custom field delimiter field.
    • Text enclosure character: Enter the character used to enclose the fields.
    • Escape character: Enter the character to be escaped in the records to be retrieved.
    • Encoding: Select the type of encoding used in the file to be retrieved. If you select Other, you will be able to enter a custom encoding type in the Custom encoding field.
    • Set header: Enable this option if the file to be retrieved contains header lines and enter or select the number of lines to be skipped in the schema.
  • Excel:
    • Excel format: Select the format/version corresponding to the file to be retrieved.
    • Sheet: Enter the name of the specific Excel sheet you want to be retrieved.
    • Set header/footer: enable these options if the file to be retrieved contains header and/or footer lines and enter or select the number of lines to be skipped in the schema.
  • Avro: No specific parameters required for this format.
  • Parquet: No specific parameters required for this format.
  • JSON: No specific parameters required for this format.
Additional parameters might be displayed depending on whether the connector is used as a source or destination dataset:
  • For HDFS source datasets:
    • Force parallelism—ignore escape char and text enclosure parameters: Enable this option if you want to ignore the escape characters and the characters used to enclose the text in your file.
  • For HDFS destination datasets:
    • Overwrite: Enable this option if the file already exists and you want to overwrite its content.
    • Merge output: Enable this option if the file already exists and you want to merge the existing and updated file content.
    • Map input column names to output: This option only applies to files with CSV, JSON, or Excel format. It ensures that the input and output field names are identical.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!