Skip to main content

Business date extraction method dropdown

Business date can be extracted from a source header, trailer, manifest file, or file name to override the date generated at runtime. The extracted load date is applied to the directory structure in the file system and used to create partitions in HCatalog.

The following methods are supported. Users select the appropriate extraction method in the dropdowns of the property panel.

Date Extraction Methods

COBOL

PATH NAME REGEX
FILE NAME REGEX
MANIFEST REGEX
MANIFEST ENTIRELY
COBOL HEADER FIELD
COBOL TRAILER FIELD

FDL

PATH NAME REGEX
FILE NAME REGEX
HEADER REGEX
TRAILER REGEX
MANIFEST REGEX
HEADER ENTIRELY
TRAILER ENTIRELY
MANIFEST ENTIRELY
HEADER_DELIMITED_COLUMN_INDEX
TRAILER_DELIMITED_COLUMN_INDEX
MANIFEST_DELIMITED_COLUMN_INDEX

JDBC

MANIFEST REGEX
MANIFEST ENTIRELY

JSON

FILE NAME REGEX
MANIFEST REGEX
MANIFEST ENTIRELY

XML

FILE NAME REGEX
MANIFEST REGEX
MANIFEST ENTIRELY

Business date extraction method dropdown

Configuration examples: Business DateTime extraction properties

Key

Value

Meaning

Extraction Method

TRAILER.REGEX

Date will be extracted from the trailer, regex will match and extract the pattern

Extraction Argument

.*(\d{4}\.\d{2}\.\d{2}\.\d{2}\.\d{2}\.\d{2}).*

Searches for regex pattern:

  • .* : Matches any character (except newline) between zero and unlimited times, as many times as possible.
  • (...) : Capturing group
  • \d{4} : Match a digit [0-9] exactly 4 times
  • \d{2} : Match a digit [0-9] exactly 2 times

Date Pattern

yyyy.MM.dd.HH.mm.ss

Provides date format to application to enable parsing. This format instructs application to parse date with year/month/day/hour/minutes seconds.

Note that if a time is not specified in the dataset.date.time.pattern then 00:00:00 will be used. Users are reminded that the date pattern must be specified using Java SimpleDateFormat pattern characters. Users should pay particular attention to the case of their pattern characters. Months are specified with uppercase ('MM') and minutes are specified with lower-case ('mm'). Users should also be aware of case distinctions between upper case and lower case when specifying the hours component of the time. Generally, uppercase ('HH') designates a 24-hour clock with the range [00-23]. If this is not a viable option, then be sure to use lower case 'hh' along with 'aa' in dataset.date.time.pattern in order to match the pattern of DateTime provided.

Trailer Regex DateTime property example

dataset data extraction method displays set to trailer regex

The following properties can be set to define method, argument, date pattern, and manifest file location.

Business load date extraction key/value properties

Key

Value

Date Extraction Method:

dataset.date.time.extraction.method

Dropdown Values:

NONE
PATH_NAME_REGEX
FILE_NAME_REGEX
HEADER_REGEX
TRAILER_REGEX
MANIFEST_REGEX
HEADER_ENTIRELY
TRAILER_ENTIRELY
MANIFEST_ENTIRELY
HEADER_DELIMITED_COLUMN_INDEX
TRAILER_DELIMITED_COLUMN_INDEX
MANIFEST_DELIMITED_COLUMN_INDEX
COBOL_HEADER_FIELD
COBOL_TRAILER_FIELD

Definitions:
PATH_NAME: source file path
FILE_NAME: source file name
REGEX: sequence of characters
defining a search pattern
MANIFEST: file accompanying data holding metadata
ENTIRELY: the trailer or header holds the date (and nothing else)
FIELD: a field name containing the date (in the trailer or header) is provided as date extraction argument value.
COLUMN INDEX: column order number

Date Extraction Argument:

dataset.date.time.extraction.argument

Value for this property will be either the 'Field Name' (when extraction method value is COBOL_HEADER_FIELD COBOL_TRAILER_FIELD

 or standard regular expression arguments (when extraction method is regex).
Regular expressions require a pattern which matches the entire string and which specify exactly one capturing group.
 

DELIMITED_COLUMN_INDEX: Users can enter any delimiter and index number for extraction in either the Header, Trailer, or Manifest.

For example, if the desired argument is the second index (starting from 0) in the following:" A|;B|;C|;D|;" one would specify "2 |;". Note the space between 2 and the delimiter (in this case a Pipe and semi-colon).

Date Pattern

dataset.date.time.pattern

This value describes what format the date is in so that the date can be accurately interpreted. Patterns are defined using Java SimpleDateFormat pattern specification.

Example:
Examples:

MM/dd/yy
MM/dd/yy/HH/mm/ss
yyyy.MM.dd
yyyy.MM.dd.HH.mm.ss
yyyyMMdd
yyyyMMddHHmmss

Manifest File Location

dataset.manifest.file.glob

Location of Manifest File Glob when Date Extraction Method is MANIFEST_REGEX
MANIFEST_ENTIRELY

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!