FILE: Flat file ingest preparation and FDL
To prepare for a CSV, TSV, PSV, or text file dataload, stage through Add Data Wizard and select Add New Data: to New source
The following example uses an FDL document, though Source and Sample files can also be used to specify data format for flat files.
The following example uses default FILE_LOCALFILE_CONNECTION though FILE sources can also be sourced from ADLS, FTP, HDFS, KAFKA, OPENCONNECTOR, S3, SFTP, WASB, and WASBS. For more information about configuring a source connection for FILE sources, see Source connection wizard
FDL: File definition language overview
FDL is an innovation that employs data definition syntax to ensure structural compliance and validation for flat delimited and text files upon ingest to HDFS or the File System used and table registration. FDL documents are custom-built for particular client data sources; they are text documents loaded in tandem with source files that prescribe record format including header, trailer, and field information. File properties include data type, source/entity/field names, communication protocols, file format, byte count, delimiters, terminators, regular expressions, line and byte counts. See Qlik Catalog Source Properties for non-default properties.
FDL provides several mechanisms by which to define entity and record structure. Users can either enter source property metadata manually in the absence of a Source File, Sample File, or an FDL document, from which header/trailer, column names, data type, etc. propagate forward to define the entity, record, and field structures. Source property definitions are enforced upon ingest/profiling. Good, Bad, and Ugly Record Counts indicate record compliance or non-compliance against definition criteria.
Note that flat (text and delimited) files do not require an FDL document, one can also specify a Sample or Source file that is used to configure the metadata environment for text delimited sources.
####################################
######### FILE INFO ################
####################################
src.sys.type=FILE
src.sys.name=gov.faa.AircraftRegistrationDatabase
src.entity.name=aircraft
src.file.glob=src/test/resources/datasets/gov/faa/AircraftRegistrationDatabase/aircraft/MASTER.utf8.bom.header.txt
src.comm.protocol=file
src.connection.file=SrcConnection.txt
####################################
######### HEADER INFO #############
####################################
unicode.byte.order.mark=UTF_8_BOM
unicode.byte.order.mark.incidence=ALWAYS
# Number of bytes making up the header. Default value is 0.
#header.byte.count=43
#Number of lines in header. Default value is 0.
header.line.count=1
# Line terminator for header. Default value is NULL.
header.line.terminator=
# Regular expression facilitating extraction of record count value. Default value is NULL.
header.record.count.regex=
# Regular expression to be used for header validation. Default is NULL.
header.validation.regex=N-NUMBER,SERIAL NUMBER,MFR MDL CODE,ENG MFR MDL,YEAR MFR,TYPE REGISTRANT,NAME,STREET,STREET2,CITY,STATE,\
ZIP CODE,REGION,COUNTY,COUNTRY,LAST ACTION DATE,CERT ISSUE DATE,CERTIFICATION,TYPE AIRCRAFT,TYPE ENGINE,STATUS CODE,\
MODE S CODE,FRACT OWNER,AIR WORTH DATE,OTHER NAMES\\(1\\),OTHER NAMES\\(2\\),OTHER NAMES\\(3\\),OTHER NAMES\\(4\\),OTHER NAMES\\(5\\),\
EXPIRATION DATE,UNIQUE ID,KIT MFR, KIT MODEL,MODE S CODE HEX,\\r\\n
####################################
######### RECORD INFO #############
####################################
record.layout=VARIABLE_CHAR_LENGTH_TERMINATED
record.min.byte.count=1
record.max.byte.count=8192
record.record.terminator=\\r\\n
record.characterset=UTF-8
record.validation.regex=
record.field.delimiter=,
record.last.field.has.delimiter=TRUE
####################################
############ FIELD INFO #############
####################################
#Enter information about each field in following format - one field description per line
# field.n=FIELD_NAME | INDEX | BUSINESS_DESCRIPTION | DATA_TYPE | LENGTH | SCALE | PRECISION | NOT_NULL | IS_ENCRYPTED_AT_SRC | DO_ENCRYPT | IS_KEY | DO_NOT_PROCESS | TRIM | VALIDATION_REGEX | OPEN_QUOTE | CLOSE_QUOTE
# where n=index i.e.1,2,3…..
# FIELD_NAME: Name of the column; column can be part of a group in which case the multiple parts of name are separated by dot ‘.’ - Required attribute and a value must be provided.
# INDEX: Position or index of field in record - Required attribute and a value must be provided.
# BUSINESS_DESCRIPTION: Business description
# DATA_TYPE: Data type of field. Valid values are: INTEGER, DECIMAL, DATE, SYMBOL, STRING, CLOB, BOOLEAN
# LENGTH: Length of field. Required attribute if the value of RECORD.FIELD_DELIMITER IS NULL.
# PRECISION: Precision
# SCALE: Scale
# NOT_NULL: If the field is required or not. Valid values are TRUE, FALSE. Default value is FALSE.
# IS_ENCRYPTED_AT_SRC: If the field data is encrypted at source or not. Valid values are TRUE, FALSE. Default value is FALSE.
# DO_ENCRYPT: Is the field data to be encrypted by the system or not. Valid values are TRUE, FALSE. Default value is FALSE.
# IS_KEY: Is the field part of the record key. Valid values are TRUE, FALSE. Default value is FALSE.
# DO_NOT_PROCESS: Ignore the field during processing. Valid values are TRUE, FALSE. Default value is FALSE.
# NO_TRIM: No not trim leading and trailing whitespace from the field. Valid values are TRUE, FALSE. Default value is FALSE.
# VALIDATION_REGEX: Regular expression to be used for field value validation.Default value is empty String
# NULL_PROXY
field.code =CODE | 0 | Assigned engine code | INTEGER | 5 | 5 | 0 | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE | [0-9]{5}
field.mfr =MFR | 1 | Manufacturer | STRING | 10 | | | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | \\S.*
field.model =MODEL | 2 | Manufacturer model | STRING | 13 | | | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | \\S.*
field.type =TYPE | 3 | Engine type code | INTEGER | 2 | 2 | 0 |FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | \\d{1,2}
field.horsepower=HORSEPOWER | 4 | Horsepower for pistons | INTEGER | 5 | 5 | 0 |FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | [0-9]{5} | 00000
field.thrust =THRUST | 5 | Thrust in pounds for turbines | DOUBLE | 6 | 6 | 0 |FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | [0-9]{6} | 0{6}