Skip to main content

Fields

An entity is composed of records and fields that hold data; a record is composed of fields that populate with data. Field metadata is critical to ingest, validation and profiling of the data. Each field is described by specific metadata that can be viewed and/or edited.

Field Information: General Information

View Details (Pencil icon) to display column attributes, metadata, and statistical information about data loaded.

View Sample Data (hive icon) displays sample records for that field. Note that multiple fields can be selected and then select top-level Sample Data to display for multiple fields.

Display and modify field properties

Display and Modify Field Properties

Select editable fields or select options from the dropdowns.

Field InformationGeneral Information properties

Name

Created when data is ingested

Business Name

user-defined

Business Description

user-defined

Technical Description

user-defined. Freeform field to describe technical characteristics of the data.

Internal Data Type

The data type as stored in Receiving Directory. Supported Data Types:

  • INTEGER
  • DOUBLE*
  • STRING
  • BOOLEAN
  • DECIMAL*

Note that the application will convert DOUBLE/DECIMAL internal fields to scientific notation when the field is very large or very small (more than seven decimal places).

Last Updated at

auto-generated. Provides data and the last time metadata was last updated (ISO standard)

Index

column sequence number, column position of field in table (ex. 1, 2, 3, 4…)

Field Information: Properties

Source Information (key/value) properties can be added from the second modal tab, Properties.

Key/Value pairs, also known as attribute-value pairs, are specific to object level.

Select the plus button (Add Property) to open a drop-down with optional field properties.

Field Information: Properties

Field Information Properties

Field Information: Lineage

Parent Lineage shows the root source of the field data. Child Lineage shows the source of the field data and identifies any other Qlik Data Catalyst objects using this field. Select the caret icon to display lineage information.

Field Information: Lineage

Field Information: Assigning tags to a field

Tags assist in locating and organizing data. Tags can be assigned in the Field Information box under the Tags tab, by filling in the open field with “Add a tag,” and selecting tab or enter key.

Field Information: Tags

Field Information: Comments

Field Information Comments allows authorized users to view and edit details and properties of the selected field. The authorized user can create a Comment Topic, and then type in Comment Details in the boxes indicated. Additional comments can be entered by selecting + Add Comments, which will create another comment field. Save each comment. A Success message will appear above the box tabs. Comments are subject to collaborative review and can be saved as Draft or Approved.

Field Information: Comments

Field Information: Data Distribution

Field level profiling statistics and data distributions are calculated for each field and recalculated against each successive data load.

Field Information: Data Distribution

Profile values

Profiling metrics of fields data provide the following top-level information:

Cardinality

The number of unique values for that field. Cardinality can be examined by Percent, Count, and Value.

Survey Count

The number of records in the field.

Survey Type

An index describing distribution method. The following Survey Types are represented:

Census: Every value in the field is counted for an exact distribution.

Sample: There were too many unique values in the field to be efficiently counted; a sample of values was used to estimate the cardinality and distribution.

Log10Survey: A counting method used for distributions with high cardinality—

number of values in the specified range.

Reading (FIELD) data distribution

Cardinality

Estimated cardinality is denoted by "approximately equal" symbol. Note that in the case of INTEGER Log10 and STRING samples, exact cardinality cannot be computed but estimated cardinality is computed and displayed with an "approximately" equal symbol.

Sample survey type

Screen Shot 18

Intervals

Reading intervals: square bracket [] and parenthesis () Notation with half-open or half-closed brackets and parentheses (ex. '[10.0, 100.0)') is used to indicate an interval from'10.0' to '100.0' that is inclusive of '10 .0' but exclusive of '100.0'. In other words, [10.0, 100.0) would be the set of all real numbers between 10.0 and 100.0, including 10.0 but not 100.0. Numbers within that interval may come very close to 100.0 (for example, 99.9999999) but 100.0 is not included but would be included in the next represented interval (ex. '[100.0, 10000.0)')

Note that intervals are listed in descending order of occurring frequency rather than value.

Screen Shot 17

Scientific notation

Data distribution intervals are notated scientifically to help represent very large and small numbers in a way that is easy to read and understand.

Screen Shot 19

Qlik Data Catalyst covers the following ranges:

  • INTEGER (-1E+18, 1E+18)
  • 18 digits, negative to positive
  • DOUBLE or DECIMAL (-1.0E+38, 1.0E+38)
  • 38 digits, negative to positive

Census vs. sample for string fields: Qlik Data Catalyst samples data to effectively build a histogram of unique data value distribution. Columns with cardinality < 4001 conduct a census that includes every unique observation, columns that number beyond that range conduct a sample.

Census survey string field

string census

Sample survey data distribution for a string field

For Integer, Double, and Decimal numeric fields, Qlik Data Catalyst profiling of numeric fields conducts a LOG10_SURVEY, which effectively builds a histogram distribution of the log10 (numeric_observation). LOG10_SURVEY results present with Survey Count, Survey Type description, and survey profile stats: Percent, Count, Value (between) [low value, high value].

Note: Qlik Data Catalyst does not display Estimated Cardinality for fields of data type Double (such as Decimal). Data type Double fields are continuous rather than discrete and so cardinality is not applicable or meaningful for profiling Double data sets.
Log10 survey data distribution for integer field

Screen Shot

Log10 survey - data distribution for double field

Screen Shot 1