Skip to main content

Profiling data

Profile statistics provide column analyses that measure incidence, ranges, and values that occur within datasets. These metrics describe relationships between field values such as:

  • Count of distinct values (cardinality)
  • Sample values, most common values, and value frequency
  • Redundancies useful in identifying default or potential duplicate values
  • Counts of null, string, and numeric values
  • Information about value ranges including min, max, average, sum, and standard deviation

Data administrators access rich technical information about their datasets from profiling. This information assists in the organization and assignment of resources and access. App developers use profile statistics and data sampling to gain ideas and direction for creating apps and planning visualizations. Field profiling can help data analysts and business users to gain insights faster. They can view and visualize valuable field profile metrics without needing to create an app first.

Catalog provides two different views of field profile data: List view and Tile view. List view is a tabular summary of configurable profile statistics and Tile view is a card-based, visual representation of fields laid out as a grid. Select the Tile Select tile icon for tile view or List Select list icon for list view icon to switch between profile views.

Tile view: Fields are profiled by metrics that are meaningful for the type of data contained in that field (for example: text versus numeric values)

Profile tile view

List view: Select profile statistics of interest from the Columns dropdown

Profile list view

Profile Tile view

Profile tile view is a visual field profile designed to display the most informative content for that type of field. The default view card type shown is determined by whether the number of numeric or text values is higher for that field. For example, for fields with both text and numeric values, Most Common Values card type displays by default if there are more text values and the Binned Frequency numeric distribution card type displays if there are more numeric values in the field. A dropdown toggle is provided so that you can switch over to the Most Common Values Frequency card type for any field that has non-unique values when A is selected; or you can switch back to the numeric distribution card if #1 is selected. Note that all card types include the number of null values, if the field has null values.

Tile view cards: Binned Frequency numeric distribution, Sample Values, and Most Common Values Frequency

Tile view profile cards

Sample values card

The Sample values card is shown when all values are unique and text-only. It will list (up to) the first three values.

Sample values profile card

Profile card sample values

Sample values profile criteria: Field values are profiled with this card when cardinality is high (all distinct values). In a case where every value is text-based and unique, a few sample values provides the best initial view into this type of field's data.

Each Sample values profile card provides: 

  • Field name
  • Cardinality
  • Up to three sample values (fields may have less than three values)

Most common values frequency card

The Most common values frequency card shows the most common two values and the frequency of those values and all other values combined as Other; unless there are only three values in which case all three values display with the frequency of each value. This profile card can be applied on text, numeric, or mixed data values.

Most common values frequency profile card with text values
Profile card most common values frequency
Most common values frequency profile card with numeric values
Profile card most common values

Most common values frequency criteria: Fields that have few values or a skewed distribution of values are profiled against the most common values frequency card. This profiling is only applied when there are multiple instances of the same values. Users can gain quick insight into the distribution of field values. If the field data includes both text and numeric values, and there are more text than numeric values, then the Most common values frequency card is shown. The Binned frequency toggle is provided when there are more than three numeric values in the field.

Each Most common values frequency profile card provides: 

  • Field name
  • Cardinality
  • Most common values and their frequency
  • Other combined frequency of remaining values

Binned frequency card

The Binned frequency card shows distribution and profiling information that is relevant for numeric fields; including minimum, average, and maximum data values. If the field data includes both text and numeric values, and there are more numeric than text values, then the Binned frequency card is shown. The Most Common Values Frequency card type is available for all fields that have non-unique values.

Binned frequency profile card
Profile card binned frequency

Each Binned frequency profile card provides: 

  • Field name
  • Cardinality
  • Histogram showing the numeric data distribution
  • Minimum value
  • Average value (the sum of the numbers divided by the total number of values in the dataset)
  • Maximum value

Profile List view

Profile list view provides a table with profile statistic options. Users check the metrics of interest that are most meaningful for the dataset being profiled under Columns. The first nine statistics are pre-selected by default.

From the hub Home tab, navigate to Your data; or from Explore, open the Data tab.

  1. Select Open dataset, then select Profile data. This action will open the Profile page for the dataset. Select the Columns button and place a checkmark next to the profile statistics of interest. These statistics must be selected (check-mark next to it) in order for them to profile the field (column) and appear in the table. The following list details available profile statistics.

    Profile Statistics
    Statistic Description
    Name Field name (example: CategoryID)
    Data type

    Qlik Sense registers data from many different systems, an external to internal uniform data type mapping is imposed on field (column) data for informational purposes. Supported data type values include:

    • Date: A date containing month, day, year in ISO 8601 format of YYYY-MM-DD

    • Time: A time value containing hour, minutes, seconds in ISO 8601 format of hh.mm.ss.sss±hh:mm
    • Datetime: A datetime value containing Year, Month, Day, Hour, Minute, Second, and fractions in format of YYYY-MM-DDThh.mm.ss.sss
    • Timestamp: A timestamp value containing Year, Month, Day, Hour, Minute, Second, fractions, and timezone in the following format of YYYY-MM-DDThh.mm.ss.sssZ
    • String: Character data representing text
    • Double: A numeric data type with double-precision 64-bit IEEE 754 floating point
    • Decimal: An exact numeric data type defined by its precision (total number of digits) and scale (number of digits to the right of the decimal point)
    • Integer: Positive or negative whole numbers
    • Boolean: A Boolean value (TRUE/FALSE)
    • Binary: Categorical data that can take exactly two possible values, such as "1" and "2"
    • Custom: Type that is outside of the mapped system-known type
    Distinct values Cardinality, number of distinct values present for this field
    Sample Values Sample values (3 sample values display)
    Sum Sum of all values in this field ("0" displays for string fields)
    Min Minimum observed value for this field (numeric fields)
    Max Maximum observed value for this field (numeric fields)
    Average Average observed value for this field
    System Tags File tags applied to identify the code set (ex. $ascii, $text)
    Standard Deviation Standard deviation for numeric fields
    Positives Number of positive values
    Negatives Number of negative values
    Zero values Number of "0" values
    Empty strings Number of empty strings
    Min length Lowest observed character length
    Average length Average observed character length
    Max length Highest observed character length
    First sorted value The first (lowest) value of sort weight (string fields)
    Last sorted value The last (highest) value of sort weight (string fields)
    Numeric values Number of numeric values
    Text values Number of text values
    Most frequent values The three most common values in the field

Sampling data

A sample of data is a subset of a population dataset. It is a useful tool for data administrators to ensure that the data conforms to expected patterns and format. App creators can get a sense for the fields and field data within the context of other records and the dataset. These views provide a first look at the data; developers can begin to explore the data for analysis and potential correlations.

Select Data sample to view a sample of the first 20 data values for each field

Dataset engine sample

  • Select the icon dropdown arrow button then Sample to view a sample (n=20) of data values for each field.

Permissions

Permissions are required to profile and sample data. The action of profiling data maps to the broader permission Profile data source. For more information, see Managing permissions in shared spaces or Managing permissions in managed spaces.

  • Profile data > Profile data source

Example