Skip to main content
Close announcements banner

Discovering and managing metadata

The Discover interface provides access to data and metadata tools enabling Search, Browse, Edit, and Organization.

Navigating Discover

Users can filter on attributes to hide/display grid-specific columns. Alternately, edit USER PREFERENCES by selecting Profile in the upper right drop-down of the main navigation bar; this opens USER PROFILE, USER ACCESS, and USER PREFERENCES where column preferences can be set.

Filtering on Columns

Navigating Objects

The Navigation Bar displays available object grids in descending hierarchical order: Datasets | Source Hierarchy | Sources | Entities | Fields

Select the desired object grid and navigate to or hover over object rows of interest. To view actions available for that object,

View {Entities | Fields}: Breaks the object out into lower (child) hierarchy level components (Sources>Entities>Fields)

View QVDs and Fields

Figure 2 – View Entitles and Fields

View Details: Select the pencil icon to open Source information panel with options to view General Information or add/define editable property values

More Actions:

Load Logs: Opens log details of dataloads with associated metadata

Create Dataset: Enables creation of Dataset with selected object

Add to Dataset: Adds object to Dataset (select appropriate Dataset from dropdown)

Editing Objects

Sources, Entities, and Fields are defined by metadata properties— key/values that are specified in General Information and Properties modals accessed by selecting the pencil icon.

General Information

The first Source Information properties tab displays details about the metadata environment. These properties are created upon entity creation.

Editable fields within the modal include Business Name, Business Description, Tags, and Source Hierarchy (via dropdown).

Data Discovery and Organization

Qlik Data Catalyst organizes data as hierarchies where objects are represented logically via entities [tables], attributes [columns/fields] and relationship properties [key/values] pairs.

Source Hierarchies are top level folder systems designed for drill in/drill out navigation of Data Sources.

Sources are collections of entities that share source and structural attributes that assist with loading and validation.

Entities are comprised of attributes [columns/fields] that answer specific relational queries.

New data is readily incorporated into Source folders upon validation, profiling, and inheritance of attributes.

Parent Object Information

To view source information for an entity or entity metrics for a field, select the object (via tick or checkbox) and note the appearance of carat next to breadcrumbs trail following Field or Entity at the top of the screen. Select the carat to display or collapse parent information for the selected entity or field.

Parent Information for QVD Entities and Fields

Global Search

To search on names across a cluster in all grids and areas of the application, the Global Search field is available in the middle of the top navigation bar.

Preliminary search results display; select View all results to open a screen with more information and drill-in options.

Local Search

Local Search returns results found in the displayed grid.

Datasets

Datasets are logical collections of sources or entities. Datasets are consistent in structure: Objects instantiate at the entity level when the workflow transits to Publish or Explore (i.e., if a Source is added to cart and has 3 QVDs, then three objects appear in the dataset.

Options for Checkout of Datasets include:

  • Create a dataset
  • Add to Cart
  • Publish to Qlik Sense

Public/Private Datasets

Users can create working sets of data for their own personal working sets or check the box next to public to make the Dataset available to users with appropriate access permissions.

Note that role-based access to Datasets is granted and edited through Security settings. Groups are given access to Datasets through Entity-level permissions. Therefore, users will be able to create datasets with the objects they have access to. Administrators edit Group access to Sources (selecting applicable individual entities) within the Security module.

Adding Entities/Sources to Datasets

  1. Select objects then select Add to Dataset.
  2. Select the appropriate Dataset from drop-down. Save updates by selecting Add to Dataset.

Adding Entities Sources to Datasets

Create Datasets with Selected Objects

  1. Select objects (Sources or Entities) then Create Dataset.
  2. Enter Dataset Name and Description.
  3. Check the box to make the Dataset Public to those with access permissions.
  4. Save the new Dataset.

Create Datasets with Selected Objects

Figure 16 – Create Datasets with Selected Objects

Add Datasets to Cart

Select Datasets, then Add to Cart.

Note that the number of datasets that have been added display in My Cart status.

Expand Datasets and/or other Entities and Sources to select specific objects for the desired operation; Select an action from Take Action dropdown.

For the example below, the Explore canvas opens with selected objects extracted to the canvas.

Custom Hive Views can then be created through Explore with the new Dataset.

Cart

Users select Entities, Sources, or Datasets to add to My Cart for use in the following Actions:

  • Create Dataset: Creation of a new dataset
  • Add to Dataset: Inclusion in an already existing dataset
  • Publish to Qlik Sense: Publish to Qlik Sense action and Publish to Qlik Sense Logs are available from this location. This option creates a new application for each published entity in Insight Advisor, the entry point for exploring your data and creating visualizations.
  • Publish to Qlik Sense Advanced: Exports QVDs to user's choice of location in Qlik Sense Enterprise: Insight Advisor (default), Data Load Editor, or Data Load Manager. Developers building Qlik Sense applications select to publish directly to Data Load Editor or Data Manager by setting a default target page in User Preferences (Profile>User Profile and Preferences>Publish to Qlik Sense Starting View) with the ability to over-ride the default setting after selecting Publish to Qlik Sense Advanced from (My Cart) Action Dropdown.
  • Developers building Qlik Sense applications can create a new application with a published entity (as before) or pick an existing Qlik Sense app from the list of existing user apps (the list is fetched when this option is selected) and add published entities to an existing one.

My Cart: Action Options

Add to Cart: Entities and Sources

Select individual Entities or Sources and then Add to Cart.

The Entities populate in My Cart in the top navigation bar. All Dataset objects will display on the canvas. Source entities can be selected individually and staged directly to My Cart or bucketed into a Dataset, then Added to My Cart. Select Take Action from the My Cart dropdown to take specific actions on those sources.

Source Hierarchy

Source Hierarchy is the highest parent object level. Sources, Entities, and Fields are nested within Source Hierarchies.

Source Hierarchies can also be nested within other Source Hierarchies.

Source Information: General Information

Sources are collections of functional Entities similar to schemas in RDBMS.

Select the View icon to display the child Entities of a Source.

Select Edit (pencil icon) to display/modify Source properties. Sources are described by metadata properties defined through key/value pairs.

Click into editable fields or select options from the dropdown(s).

View and Edit Source properties

View and Edit Source Properties

Source Information – General Information properties
Property Property information

Name

User-defined or populated upon import

Business Name

Editable

Business Description

Editable

Last Updated at

Not editable

Tags

Editable

Source Hierarchy

Qlik Sense Connector

Communication Protocol

OPENCONNECTOR

Source Type

PODIUM_INTERNAL

Base Directory

Not editable [post-ingest]

Source Information: Properties

Source Information (key/value) properties can be added from the second tab in the Source Information pop-up box.

Properties available in this dropdown have been (externally) defined as Source Properties.

More Drop-Down Box for Sources

Select More dropdown to access Load Logs, Dataset functionality, and the Properties panel.

Alternately, select the Source row(s) and select quicklink buttons for Load Logs, More (Add to Dataset or Create Dataset), and Add to Cart.

Entities

Entities are database tables defined by metadata (key/value) properties.

Select View to display the child Fields of an Entity.

QVD Entity Information: General Information

Select Edit (pencil icon) to display/modify QVD Entity properties. Entities are described by metadata properties defined through key/value pairs.

Entity Information – General Information Properties

Name

User-Defined or populated upon import

Business Name

Editable

Business Description

Editable

Short Name

Editable

Entity Type

INTERNAL

Stored Format Type

TEXT_TAB_DELIMITED

Last Updated at

Not editable

Tags

Editable (use tags to assist in locating and organizing data)

Entity Information: Properties

Key/Value pairs, also known as attribute-value pairs, can be added at any object level.

Select More dropdown to access Sample Data (if available), Lineage, Load Logs (if available), Dataset functionality (Create Dataset and Add to Dataset)

Alternately, Tick or Check the Entity row(s) and select quicklink buttons for Load Logs, More (Create Dataset, Add to Dataset) Select More options for one or more Entities

Fields

An entity is composed of records and fields that hold data; a record is composed of fields that populate with data. Field metadata is critical to ingest, validation and profiling of the data. Each field is described by specific metadata that can be viewed and/or edited.

Field Information: General Information

View Details (Pencil icon) to display column attributes, metadata, and statistical information about data loaded.

View Sample Data displays sample records for that Field. Note that multiple fields can be selected and then select top-level Sample Data to display for multiple fields.

Field Information – General Information Properties

Name

Created when the data is uploaded

Business Name

User-Defined

Business Description

User-Defined

Technical Description

User-Defined. Freeform field to describe technical characteristics of the data.

Internal Data Type

The data type as stored in Receiving Directory. Supported Data Types

INTEGER

DOUBLE

STRING

BOOLEAN

Last Updated at

Auto-Generated. Provides data and time metadata was last updated (ISO standard)

Index

Column Sequence Number, column position of field in table (ex. 1, 2, 3, 4…)

Field Information: Properties

Source Information (key/value) properties can be added from the second modal tab, Properties. Values available in this dropdown have been defined as Field Properties.

Key/Value pairs, also known as attribute-value pairs, are specific to object level.

Field Information: Lineage

Parent Lineage shows the root source of the field data.

Child Lineage shows the source of the field data and identifies any other Qlik Data Catalyst objects using this field. Select the caret icon to lineage information.

**Users are encouraged to leverage the Lineage Graphs available in Catalog module.**

Field Information: Assigning Tags to a Field

Tag assist in locating and organizing data. Tags can be assigned in the Field Information box under the Tags tab, by filling in the open field with “Add a tag,” and selecting the Tab or Enter key.

Field Information: Comments

Field Information Comments allows authorized users to view and edit details and properties of the selected field. The authorized user can create a Comment Topic, and then type in Comment Details in the boxes indicated. Additional comments can be entered by selecting + Add Comments, which will create another comment field. Save each comment. A Success message will appear above the box tabs. Comments are subject to collaborative review and can be saved as Draft or Approved.

Field Information: Data Distribution

Field level profiling statistics and data distributions are calculated for each field and recalculated against each successive data load.

Census vs. Sample for String Fields: Qlik Data Catalyst samples data to effectively build a histogram of unique data value distribution. Columns with cardinality < 4001 conduct a census that includes every unique observation, columns that number beyond that range conduct a "sample".

Census Survey String Field

string census

Sample Survey Data Distribution for a String Field

C:\Users\User\Dropbox\Screen Shot 2019-01-16 at 6.27.09 PM.png

Sample Data and Data Load Filter

Sample data and other filtered views can be observed from Entity and Field grids in Discover. Qlik Data Catalyst creates a Sample Directory upon creation of Entities for inclusion in Publish or Prepare flows. Sample Data probability can be set in core_env.properties (default.record.sampling.probability) or at Source or Entity level properties (record.sampling.probability) with a default value of .01 (at all levels).  Entity level value can over-ride Source level value which can over-ride the global value set in core_env.properties.

To set the sample probability of the entire original dataset to include all records specify a value of 1.0.

To include half of the records, specify a value of .5.

To set the sample set to none (no sample records) specify a value of 0.

Filter options:

  • The filter provides Display Length options (100 | 1000 | 2000 | 3000)
  • Data Load (All | Sample | Latest Load | Specific Loads: {Search on Available Loads})
  • Other Data (Ugly Records | Bad Records | Profile Statistics)

In Discover, select Sample Data from the More dropdown screen to open the Sample Data and Data Load Filter. The filter opens to Sample Data screen by default; user can specify Display Length for Data LoadorOther Data selections

Sample Data Load Filters

Sample Data Load Filters

Data Load Filter: Record Filter Details

Sample Records

If the entity has sample records in its latest load, those will show up if this filter has been applied. If there are no sample records the message "No data available in table" displays. Only sample records from the latest load can be viewed and no previous history is available for selection. The columns of the table are the same as the entity fields. Note that this option is disabled for ADDRESSED entities.

The following default filters will be in place around sample data:

  • If sample data is available for the latest load, sample data will be shown by default.
  • If no sample data is available and the entity is a snapshot entity, the latest partition is accessed for sampling and that sample data shown by default.
  • If no sample data is available and the entity is an incremental entity, data from all loads are accessed and that sample data shown by default.
  • For registered entities only "sample data" option will be active and all other options will be disabled.

Ugly Records

(field data is problematic)

If the entity has Ugly records in its latest load, those will show up if this filter has been applied. If there are no ugly records the message "No data available in table" displays. Only ugly records from the latest load can be viewed and no previous history is available for selection. The columns of the table are the same as the entity fields. Note that this option is disabled for REGISTERED and ADDRESSED entities.

Bad Records

(record structure conflict)

If the entity has Bad records in its latest load, those will show up if this filter has been applied. If there are no bad records, the message "No data available in table" displays. Only bad records from the latest load can be viewed and no previous history is available for selection. The table has a single column record which contains the bad record as a string. Note that this option is disabled for REGISTERED and ADDRESSED entities.

Profile Statistics

(data profile metric)

The profile statistics from the latest load are shown if this filter has been applied, no previous history of statistics is available for selection. The table has 6 columns: entity_nid, field_nid, field_name, profile_metric, field_value, metric_value. Note that the entity_nid, field_nid and field_name refer to the nids (NID=Numeric Identifier) and names of the external entity and field respectively and not the internal entity/fields in Discover. This option is disabled for ADDRESSED entities.

Profile Statistics: Note that Profile Statistics are field-level metrics unless the statistic name begins with Record (ex. INTERNAL_DATA_TYPE).

Profile Statistics

Profile Statistics

Field-level metrics are identifiable by Field_nid column and prefixed with Validator_ or Profiler_

Field-Level Metrics

Field-Level Metrics