Skip to main content

Profiling data

Data administrators access rich technical information about their datasets from profiling. This information assists in the organization and assignment of resources and access. App developers use profile statistics and data sampling to gain ideas and direction for creating apps and planning visualizations. Field profiling can help data analysts and business users to gain insights faster. They can view and visualize valuable field profile metrics at-a-glance without needing to create an app first.

Profile statistics provide column analyses that measure incidence, ranges, and values that occur within datasets. These metrics describe relationships between field values such as:

  • Count of distinct values (cardinality)
  • Sample values, most common values, and value frequency
  • Redundancies useful in identifying default or potential duplicate values
  • Counts of null, string, and numeric values
  • Information about value ranges including min, max, average, sum, and standard deviation

Catalog provides three views of field profile data: Tile view, List view, and Data view.

Tile view is a card-based, visual representation of fields laid out as a grid.

List view is a tabular summary of configurable profile statistics.

Table viewlists field column names and up to the first twenty records of the dataset.

Select the TileSelect tile icon for tile view, ListSelect list icon for list view, or Data icon to switch between profile views.

Profile Tile view

Profile tile view is a visual field profile designed to display the most informative content for that type of field. The default view card type shown is determined by whether the number of numeric or text values is higher for that field. For example, for fields with both text and numeric values, Most Common Values card type displays by default if there are more text values and the Binned Frequency numeric distribution card type displays if there are more numeric values in the field. A dropdown toggle is provided so that you can switch over to the Most Common Values card type for any field that has non-unique values when  icon Most Common Values is selected; or you can switch back to the numeric distribution card if Binned Frequency icon Binned Frequency is selected. Note that all card types include the number of null values, if the field has null values.

Tile view: Fields are profiled by metrics that are meaningful for the type of data contained in that field (for example: text versus numeric values)

Profile tile view of dataset

Sample values card

The Sample values card is shown when all values are unique and text-only. It will list (up to) the first three values.

Tile view card: Sample Values
Sample values tile

Sample values profile criteria: Field values are profiled with this card when cardinality is high (all distinct values). In a case where every value is text-based and unique, a few sample values provides the best initial view into this type of field's data.

Each Sample values profile card provides: 

  • Field name
  • Cardinality (distinct values)
  • Up to three sample values (fields may have less than three values)

Most common values frequency card

The Most common values frequency card shows the most common two values and the frequency of those values and all other values combined as Other; unless there are only three values in which case all three values display with the frequency of each value. This profile card can be applied on text, numeric, or mixed data values.

Tile view card: Most Common Values Frequency
Tile most common values frequency

Most common values frequency criteria: Fields that have few values or a skewed distribution of values are profiled with the most common values frequency card. This profiling is only applied when there are multiple instances of the same values. Users can gain quick insight into the distribution of field values. If the field data includes both text and numeric values, and there are more text than numeric values, then the Most common values frequency card is shown. The Binned frequency toggle is provided when there are more than three numeric values in the field.

Each Most common values frequency profile card provides: 

  • Field name
  • Cardinality (distinct values)
  • Most common values and their frequency
  • Other combined frequency of remaining values

Binned frequency card

The Binned frequency card shows distribution and profiling information that is relevant for numeric fields; including minimum, average, and maximum data values. If the field data includes both text and numeric values, and there are more numeric than text values, then the Binned frequency card is shown. The Most Common Values Frequency card type is available for all fields that have non-unique values.

Tile view card: Binned Frequency numeric distribution

Binned frequency tile

Each Binned frequency profile card provides: 

  • Field name
  • Cardinality (distinct values)
  • Histogram showing the numeric data distribution
  • Minimum value
  • Average value (the sum of the numbers divided by the total number of values in the dataset)
  • Maximum value

Profile List view

Profile list view provides a table with profile statistic options. Users check the metrics of interest that are most meaningful for the dataset under the ColumnPicker icon columns which can be found by scrolling to the far-right edge of the table. The first nine statistics are pre-selected by default.

List view: Select profile statistics of interest from the ColumnPicker dropdown found by scrolling right on the table

Profile list view of dataset

Profile Data view

The profile data view displays your dataset as a straight data table with field column names and (up to) the first twenty values.

Data view: Dataset column names and the first twenty records display

Profile data view of dataset

Permissions

Permissions are required to profile and sample data. The action of profiling data maps to the broader permission Profile data source. For more information, see Managing permissions in shared spaces or Managing permissions in managed spaces.

  • Profile data > Profile data source