JSON: Overview
Javascript Object Notation (JSON) is a lightweight, text-based data transfer format well-suited for the exchange of data.
Qlik Catalog supports the ingest of JSON formatted data [including flat files and kafka queues] and conversion of that data into queryable tables.
JSON works by encoding Javascript variables (objects) and passing them using special notation–hence the name JavaScript Object Notation. JavaScript notation enables objects to be defined using key:value syntax; the object in this context is a record. Note that while JSON supports ordered sequences of values called arrays where fields are determined by an index, arrays are not supported by Qlik Catalog at this time.
In Qlik Catalog, processing of JSON Files is driven by a Hierarchical Extraction and Normalization (.hen) file—the file is created manually by hand in the text editor of choice. The HEN file works similarly to a copybook for mainframe or XSD for XML; it is a specification file uploaded with the data. The HEN file describes and validates the data upon ingest into Qlik Catalog.
JSON .hen specification: Creating hierarchical extraction and normalization files
A .hen file defines Hierarchy Extraction and Normalization (== HEN) rules for extracting tabular record-oriented data from JSON data structures.
Conversion to a tabular format is required when one wants to perform data analysis using relational databases, spreadsheets, and BI visualization tools.
Some JSON datasets contain no hierarchical nesting of data structures. One usually sees this when a JSON export is performed from a relational database system. Since a relational table contains only scalar values (first normal form) there are no aggregate values such as nested arrays or nested structures. These cases are relatively easy to work with, but are quite rare. Currently, Qlik Catalog supports this type of JSON export; flattened (non-nested, non-array) JSON datasets.
HEN format uses JSON-path syntax to select data components from a JSON dataset. For more information, see the
A .hen file is a text file. It describes one or more JSON tables along with their associated fields.
Blank lines or lines containing only whitespace are ignored.
If the first character of a line is the( # ) pound sign then the line is considered a comment and is ignored.
The .hen file contains directives which tag and delineate different components. Directives must start on the first character on a line and must be immediately followed by the colon ( : ) character without any intervening whitespace. The following table defines directives and their meanings, in the order in which they should be specified in a .hen file.
*NOTE: JSON_ARRAY and ARRAY_OF_JSON_VALUES are not currently supported.
Directive | Meaning |
---|---|
datasetName: |
String name of the datasets. Should not contain whitespace. May contain underscore characters.
|
jsonRecordStructure: |
JSON_OBJECT
|
jsonDatasetOrganization: |
Should contain one of: SINGLETON_JSON_VALUE SEQUENCE_OF_JSON_VALUES
|
table: |
Defines the name of a table to be extracted as well as the JSON-path for JSON values to be used for the table. This directive may occur more than once in a single .hen file to define multiple tables for a single JSON dataset.
|
A table is followed by one or more field specifications.
A field specification is not preceded by a directive, it is simply a field name followed by a JSON-path.
The JSON-path is generally relative to the table-specific JSON-path using a leading USD-sign ( $ ), but in some cases may be absolute using a leading at ( @ ) sign.
Both of these cases follow JSON-path conventions.
Below are two examples of HEN Specification files and an extract of the JSON files from which data is extracted.
Example: Example of HEN file for dataset "tickets":
{
"datasetName": "tickets",
"jsonRecordStructure": "JSON_OBJECT",
"jsonDatasetOrganization": "SINGLETON_JSON_VALUE",
"tableDefinitions":
[
{
"tableName": "store",
"tableDriverSpecification": "$.store[*]",
"fieldDefinitions":
[
{"fieldName": "ticketId", "fieldSpecificationString": "$.store[*].id" },
{"fieldName": "created_at", "fieldSpecificationString": "$.store[*].created_at" },
{"fieldName": "raw_subject", "fieldSpecificationString": "$.store[*].raw_subject" },
{"fieldName": "tags", "fieldSpecificationString": "$.store[*].tags" },
{"fieldName": "tags2", "fieldSpecificationString": "$.store[*].tags[*]" },
{"fieldName":"firstCustomFieldValue","fieldSpecificationString":"$.store[*].customFields[0].value" },
{"fieldName": "ticketJson", "fieldSpecificationString": "$.store[*]" }
]
},
{
"tableName": "store_tag",
"tableDriverSpecification": "$.store[*].tags[*]",
"fieldDefinitions":
[
{"fieldName": "ticketId", "fieldSpecificationString": "$.store[*].id" },
{"fieldName": "tag", "fieldSpecificationString": "$.store[*].tags[*]" }
]
}
]
}
{"ticket":[
{"url":"https://spectrummobile-uat.zendesk.com/api/v2/tickets/1.json",
"id":1,
"external_id":null,
"via":{channel":"sample_ticket",
"source":{"from"{},"to":{},"rel":null}},
"created_at":"2018-01-24T17:36:44z",
"updated_at":"2018-03-02T16:20:59z",
"type":"incident"
"subject":"Sample ticket: Meet the ticket",
"raw_subject":"Sample ticket: Meet the ticket",
"description":"Hi Chris, \n\nEmails, chats, voicemails, and tweets are captures in Zendesk support as tickets. Start ticket
click Submit to send. To test how an email becomes a ticket, send a message to support@spectrummobile-uat.zendesk.com
your customers will see what you reply? Check out this video:\nhttps://demos.zendesk.com/hc/en-us/articles/202341799\"
"priority":"high",
"status":"closed",
"recipient":null,
"requester_id":360335701272,
"submitter_id":360333737972,
"assignee_id":null,
"organization_id";null,
"group_id":null,
"collaborator_ids":[],
"follower_ids":[],
"email_cc_ids":[],
"forum_topic_id":null,
"problem_id"null,
"has_incidents":false,
"is_public":true,
"due_at":null,
"tags":["auto_close_tickets","autoclose_voice_care_7days","cs_tier_1_unresolved","sample","support","zendesk"],
"custom_fields":[
{"id":360001008831,"value":null},
{"id":360000949292,"value":false},
{"id":360001008851,"value":null},
{"id":360001008871,"value":null},
{"id":360000949312,"value":null},
{"id":360000949332,"value":null},
{"id":360001008891,"value":null},
{"id":360000949352,"value":null},
{"id":360000949372,"value":null},
{"id":360000949392,"value":null},
{"id":360001008911,"value":null},
Example: Example of HEN file for dataset "groups":
{
"datasetName": "groups",
"jsonDatasetOrganization": "SINGLETON_JSON_VALUE",
"jsonRecordStructure": "JSON_OBJECT",
"tableDefinitions":
[
{
"tableName": "groups",
"tableDriverSpecification": "$.groups[*]",
"fieldDefinitions":
[
{"fieldName": "url", "fieldSpecificationString": "$.groups[*].url" },
{"fieldName": "id", "fieldSpecificationString": "$.groups[*].id" },
{"fieldName": "name", "fieldSpecificationString": "$.groups[*].name" },
{"fieldName": "deleted", "fieldSpecificationString": "$.groups[*].deleted" },
{"fieldName": "created_at", "fieldSpecificationString": "$.groups[*].created_at" },
{"fieldName": "updated_at", "fieldSpecificationString": "$.groups[*].updated_at" }
]
}
]
}