Explore enables the creation of custom Hive or Postgres distribution service views as internal User views and tables accessible in the discover module.
Access Explore screen through My Cart or from Datasets in Discover.
Access Explore from Datasets through gear dropdown
The following example details the Explore process through Add To Cart.
To select entities individually, mark the check box next to the entity or entities to be included, and select Add To Cart.
The entities populate in My Cart in the top navigation bar. Select Take Action from My Cart dropdown and choose Explore from the dropdown box. All Dataset objects will display on the canvas. Source entities can be selected individually and staged directly to My Cart or bucketed into a Dataset, then added to My Cart.
To add more entities to the canvas, select Add Source from the canvas. Add Source wizard stages the user through source and entity selection for inclusion on the Explorecanvas,
Modifying entities for Explore canvas
Remove fields from an entity
Users can create a custom subset view of an entity by deleting fields from a source entity. This is a straightforward operation and requires no SQL scripting. Select the (view edit) icon to open a field panel (Edit Source).
Select the subset of field to include, select Update.
The source entity now displays only the fields that were selected.
Before defining and adding a target entity, expand the controller icon to review Input, Script, and Output for the Explore flow.
INPUT displays field name and data type for the selected fields.
SCRIPT allows the user to define conditions for the view in the Script section with query language commands. Users proficient with query languages can write custom queries.
OUTPUT shows Field Name and Data Type for the fields as they will display. Output fields can be renamed.
INPUT section example
FIELD NAME and DATA TYPE display for the selected fields.
SCRIPT query language section example
Define conditions for the view in the script section with Hive Query Language (HQL) commands. The example below shows a Join on the two source entities. Users proficient with query languages can insert custom queries.
- OUTPUT FIELD and DATA TYPE display for the fields as they will display in the created view
- OUTPUT fields can be renamed.
Selecting Add Target to define Source Hierarchy, Source System Name(source name), Entity Info (Name), and Groups access.
Source System Name can be edited, the name can be removed and replaced. Define or select the Source Name first and Source Hierarchy dropdown will populate accordingly.
Select the pencil icon in the upper right of the explore screen to make changes to execution parameters any time before execution. A modal will display with execution options. (These parameters are not supported single node environments.)
View Name Target entity name (becomes the user table or view name)
Query Engine: Options are driven through core_env.property explore.available.engines=MAPREDUCE,SPARK,TEZ
The engine names as presented in the modal (HIVE ON MAPREDUCE, HIVE ON SPARK, HIVE ON TEZ) clarify that the engines query Hive distribution tables.
Hcat: Hcatalog is a table storage management tool that exposes the Hive metastore (the system catalog containing metadata about Hive -- to create tables, columns, and partitions). The Hcat table option provides a materialized table, a snapshot that will not change if the original explore entities change. The Hcat view option re-queries the entities with a select statement.
Once the target entity (user view) has been added to the canvas, Validate then Execute the query.
Query Results display on the Explore Canvas.
Checkout saves the query as a User view entity in discover. This view is an internal view only (not accessible from source module)