Skip to main content

Records: Good, bad, ugly, filtered

Record Status

Description

Good

Good records pass all quality tests, meaning that all fields in that record match or comply with expected values based on the validation rules. Good records will have the expected number of fields, proper datatypes, matching delimiters and terminators, and conforming character sets.

Bad

Bad records have a corrupt record structure that often surfaces as not enough or too many fields. This is often caused by record or field delimiters embedded in a string or a control character or other hidden characters that can't be processed. Bad records do not conform to the specified record layout, they may have the incorrect number of columns (fields), delimiters, headers or trailers.

The most common reasons for Bad records are:

  • Incorrect layout information (wrong fixed length bytes or delimiter)

  • Embedded delimiters within fields such as a comma in the middle of a description or address field

  • Non-ASCII or control characters causing record breaks

Ugly

These records may match the record format but some of the field data is problematic. Ugly records are configurable but include data that don't match the field datatype. Examples include invalid data formats; data containing invalid or unprintable characters, or data that don't match a user-specified pattern or regular expression.

The most common reasons for Ugly records are:

  • Data type inconsistencies (for example, non-numeric data in a field defined as numeric)

  • Control characters within a field (causing issues within the Hadoop code stack)

Filtered

Filtered records represent the difference between the total of Good, Bad, and Ugly records and Expected Record Count. These records become significant when reconciling record counts or identifying records to be filtered via RecordFilterString mechanisms.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!