Data mart view structure
This topic describes the internal header columns (identifiable by their hdr__ prefix) of the data mart views. Data pipelines often culminate in data marts, which denormalize data into star schemas. Facts present a convenient structure for consumption by analysts using dimensional modeling. Therefore, understanding the role of the header columns will allow you to generate more focused reports for analytics and consumption by downstream applications.
Fact view
View name format: <DATA_ASSET_SCHEMA>.[<PREFIX>]<FACT>
The following header columns are added to the view structure:
Column |
Type |
Description |
---|---|---|
hdr__deleted | Boolean |
Indicates whether or not the record was deleted from the fact table. |
hdr_{dimension-name}_key_id | int64 |
Reference to Type 1 and Type 2 dimensions. The fact view will contain a separate column for each dimension in the fact. Example: hdr__EMPLOYEES_key_id |
Type 1 dimension view
View name format: <DATA_ASSET_SCHEMA>.[<PREFIX>]<DIMENSION>
The following header columns are added to the view structure:
Column |
Type |
Description |
---|---|---|
hdr_{dimension-name}_key_id | int64 |
Sequence incremented per record. |
hdr__deleted | Boolean |
Indicates whether or not the record was deleted from the dimension table. |
Type 2 dimension view
View name format: <INTERNAL_SCHEMA>.[<PREFIX>]<DIMENSION>
The following header columns are added to the view structure:
Column |
Type |
Description |
---|---|---|
hdr_{dimension-name}_key_id |
int64 |
Sequence incremented per record. |
hdr__from_timestamp | timestamp |
The UTC start timestamp for this version of the record. For example, the column might indicate the date a customer started living at a certain address. This column allows you to:
For a Type 2 dimension with denormalized entities, this is the timestamp from the updated dataset in the Storage or Transform data asset. |
hdr__to_timestamp | timestamp |
The UTC end timestamp of the record version. The column is used to delimit the range of dates for a given record version. For example, the column could indicate the date a customer stopped living at a certain address. See also the description of the hdr_from_timestamp column above. |
hdr__operation | varchar(1) |
Full load operations:
Change processing operations (using Change Tables):
Actions resulting from compare and apply operations:
|
hdr__was_current_from_timestamp | timestamp |
Shows the UTC timestamp of when the last update was applied to the record. This column together with the hdr__was_current_to_timestamp column, allows you to analyze data by a specific date, knowing only what you knew then. For example, let's say that every day you calculate the total of the previous day's orders at 2:00 am. So on December 1st, the total sum of the previous day's orders is $1,000,000. However, on December 2nd, a record is inserted saying that on November 30th, an order totaling $500,000 was placed. So the order amount for November 30th was actually $1,500,000! However, if you generate a report for the total sum of orders on November 30th based on what was current on December 1st at 2:00 am, the result would still be $1,000,000. |
hdr__was_current_to_timestamp | timestamp |
This column is populated when a record is replaced with a newer record. It shows the UTC timestamp of when the record was processed, and therefore no longer the latest record. See also the description of the hdr__was_current_from_timestamp column above. |