Qlik Catalog record count reconciliation
For many enterprises, reconciliation of record counts is a key feature for accurate ingest of business data. Qlik Catalog provides a mechanism for users to define a method/location from which to extract expected record count for reconciliation with observed record count upon load. If expected and observed record counts are not equal the load will fail, alerting users of the mismatch and generating a log entry indicating the discrepancy.
Qlik Catalog supports record count reconciliation for entities resulting from ingest of multiple files. The files for ingestion may have an expected record count in the header or trailer or a separate manifest file. In this scenario, the file pattern for ingestion might be a wildcard (with a specified search pattern (e.g., foo/*/bar, /receiptdate*/). In cases where a wildcard is specified (in src.file.glob) and an ExpectedRecordCount is found in the header or trailer of each file, the application will sum the individual file ExpectedRecordCounts to create a totalExpectedRecordCount that is used for the partition or work order. If a separate manifest file is provided then the ExpectedRecordCount will be used in the manifest as the WorkOrder.expectedRecordCount, regardless of the number of files specified by the src.file.glob.
The following methods are supported by format. Users select the appropriate extraction method via dropdowns in the property panel.
COBOL |
PATH NAME REGEX FILE NAME REGEX MANIFEST REGEX MANIFEST ENTIRELY COBOL HEADER FIELD COBOL TRAILER FIELD |
FDL |
PATH NAME REGEX FILE NAME REGEX HEADER REGEX TRAILER REGEX MANIFEST REGEX HEADER ENTIRELY TRAILER ENTIRELY MANIFEST ENTIRELY HEADER_DELIMITED_COLUMN_INDEX TRAILER_DELIMITED_COLUMN_INDEX MANIFEST_DELIMITED_COLUMN_INDEX |
JDBC |
Not currently supported |
XML |
XML ELEMENT |
Key |
Value |
Meaning |
Extraction Method |
TRAILER_REGEX |
Trailer holds Record Count, regex will match and extract the pattern |
Extraction Argument |
(\d\d\d\d)\s\s? |
Searches for regex pattern:
|
Record Count Adjustment |
0 |
No adjustment for Header/Trailer |
The following properties can be set to define method, argument, record count adjustment (e.g., adjusting for header and trailer), and manifest file location.
Key | Value |
---|---|
Record Count Extraction Method dataset.expected.record.count.extraction.method |
Dropdown Values:
Definitions:
|
Record Count Extraction Argument dataset.expected.record.count.extraction.argument |
Value for this property will be either the 'Field Name' (when extraction method value is COBOL_HEADER_FIELD COBOL_TRAILER_FIELD or standard regular expression arguments (when extraction method is regex). Regular expressions require a pattern which matches the entire string and which specify exactly one capturing group. When using XML_ELEMENT method to extract record count, this argument is the XPATH expression to navigate to the element (or attribute) locationcontaining the record count value. Example value: NotificationFileTrailer/RecordCount DELIMITED_COLUMN_INDEX: Users can enter any delimiter and index number for extraction in either the Header, Trailer, or Manifest. For example, if the desired argument is the second index (starting from 0) in the following:" A|;B|;C|;D|;" one would specify "2 |;". Note the space between 2 and the delimiter (in this case a Pipe and semi-colon). |
Record Count Adjustment Number dataset.expected.record.count.adjustment
|
Record offset number adjusts record count 0 or no entry = record count in the dataset exactly specifies the number of data records. 1 = record count in the dataset includes 1 for a Header (or Trailer) record. 2 = record count in the dataset includes 1 for a Header record plus 1 for a Trailer record. Other = FDL or XML may require an adjustment between cat and expected record count |
Manifest File Location dataset.manifest.file.glob |
Location of Manifest File Glob when Record Count Extraction Method is MANIFEST_REGEX or MANIFEST_ENTIRELY |