Skip to main content

Sample data and data load filter

Sample data and other filtered views can be observed from entity and field grids in discover. Qlik Data Catalyst creates a sample directory upon creation of entities for inclusion in Publish or Prepare flows. Sample data probability can be set in core_env.properties (default.record.sampling.probability) or at source or entity level properties (record.sampling.probability) with a default value of .01 (at all levels).  Entity level value can over-ride source level value which can over-ride the global value set in core_env.properties.

To set the sample probability of the entire original dataset to include all records specify a value of 1.0.

To include half of the records, specify a value of .5.

To set the sample set to none (no sample records) specify a value of 0.

Filter options:

  • The filter provides Display Length options (100 | 1000 | 2000 | 3000)
  • Data Load (All | Sample | Latest Load | Specific Loads: {Search on Available Loads})
  • Other Data (Ugly Records | Bad Records | Profile Statistics | Custom Metrics)

In Discover, select Sample Data from the More dropdown screen to open the Sample Data and Data Load Filter.

Open Sample Data and data load filter

Open the Sample Data and Data Load Filter

The filter opens to the sample data screen by default; user can specify Display Length for Data Load or Other Data selections

Sample data load filters

Sample Data Load Filters

Data load filter: record filer details

Sample Records

If the entity has sample records in its latest load, those will show up if this filter has been applied. If there are no sample records the message "No data available in table" displays. Only sample records from the latest load can be viewed and no previous history is available for selection. The columns of the table are the same as the entity fields. Note that this option is disabled for ADDRESSED entities.

The following default filters will be in place around sample data:

  • If sample data is available for the latest load, sample data will be shown by default.
  • If no sample data is available and the entity is a snapshot entity, the latest partition is accessed for sampling and that sample data shown by default.
  • If no sample data is available and the entity is an incremental entity, data from all loads are accessed and that sample data shown by default.
  • For REGISTERED entities only sample data option will be active and all other options will be disabled.

Ugly Records

(field data is problematic)

If the entity has Ugly records in its latest load, those will show up if this filter has been applied. If there are no ugly records the message "No data available in table" displays. Only ugly records from the latest load can be viewed and no previous history is available for selection. The columns of the table are the same as the entity fields. Note that this option is disabled for REGISTERED and ADDRESSED entities.

Bad Records

(record structure conflict)

If the entity has Bad records in its latest load, those will show up if this filter has been applied. If there are no bad records, the message "No data available in table" displays. Only bad records from the latest load can be viewed and no previous history is available for selection. The table has a single column record which contains the bad record as a string. Note that this option is disabled for REGISTERED and ADDRESSED entities.

Profile Statistics

(data profile metric)

The profile statistics from the latest load are shown if this filter has been applied, no previous history of statistics is available for selection. The table has 6 columns: entity_nid, field_nid, field_name, profile_metric, field_value, metric_value. Note that the entity_nid, field_nid and field_name refer to the nids (NID=Numeric Identifier) and names of the external entity and field respectively and not the internal entity/fields in Discover. This option is disabled for ADDRESSED entities.

Profile Statistics: Note that Profile Statistics are field-level metrics unless the statistic name begins with Record (ex. RECORD_BAD_COUNT).

Profile Statistics

Profile Statistics

Field-level metrics are identifiable by Field_nid column and prefixed with VALIDATOR_ or PROFILER_

Field-level metrics

Field-Level Metrics

Select field Sample Data: Select hive icon to display Sample Data for a particular field

Select Sample Data

Select Field Sample Data

Field sample data

Field Sample Data