Explore enables the creation of custom Hive or Postgres distribution service views as internal User Views and Tables accessible in the Discover module.
Access Explore screen through My Cart or from Datasets in Discover.
Access Explore from Datasets through 'gear' dropdown
The following example details the Explore process through Add To Cart.
To select entities individually, mark the check box next to the entity or entities, and click Add To Cart.
The Entities populate in My Cart in the top navigation bar. Select Take Action from My Cart dropdown and choose Explore from the dropdown box. All Dataset objects will display on the canvas. Source entities can be selected individually and staged directly to My Cart or bucketed into a Dataset, then Added to My Cart.
To add more entities to the canvas, select Add Source from the canvas.
Modifying entities for Explore canvas
From the entity top tab users can minimize the entity display (minus sign), edit fields (pencil), remove from the screen (trashcan), view source (i), view sample data (inkwell), view profile data (disk).
Remove fields from an entity
Users create a custom Hive subset view of an entity by deleting fields from a Source Entity. This is straightforward and requires no SQL scripting. Select the Edit (pencil) icon to open Edit Source panel.
Select the subset of Fields to retain in the new View, select Update.
The Source Entity now displays only the fields that were selected.
Before defining + adding a target entity, expand the controller icon to review Input, Script, and Output for the Explore flow.
INPUT displays Field Name and Data Type for the selected fields.
SCRIPT allows the user to define conditions for the view in the Script section with Hive Query Language commands. Users proficient with Query Language can write custom queries.
OUTPUT shows Field Name and Data Type for the fields as they will display. Output fields can be renamed.
INPUT section example
FIELD NAME and DATA TYPE display for the selected fields.
SCRIPT Query Language section example
Define conditions for the view in the Script section with Hive Query Language commands. The example below shows a Join on the two source entities. Users proficient with Query Language can insert custom queries.
- OUTPUT FIELD and DATA TYPE display for the fields as they will display in the created view (
- OUTPUT fields can be renamed.
Selecting Add Target to define Source Hierarchy, Source System Name, Entity Info (Name), and Groups Access.
Source System Name can be edited, the name can be removed and replaced. Define or Select the Source System Name first and Source Hierarchy dropdown will populate accordingly.
Select the Pencil icon in the upper right of the Explore screen to make changes to execution parameters any time before execution. A modal will display with execution options. (These parameters are not supported in Single Server environments.)
View Name Target entity name (becomes the user table or view name)
Query Engine: Options are driven through core_env.property 'explore.available.engines=MAPREDUCE,SPARK,TEZ'
The engine names as presented in the modal (HIVE ON MAPREDUCE, HIVE ON SPRARK, HIVE ON TEZ) clarify that the engines query distribution tables.
Hcat: Hcatalog is a table storage management tool that exposes the Hive metastore (the system catalog containing metadata about Hive-- create tables, columns, and partitions). The Hcat table option provides a materialized table, a snapshot of that will not change if the original Explore entities change. The Hcat View option re-queries the entities with the 'Select' statement.
Once the target entity (user view) has been added to the canvas, Validate then Execute the query.
Query Results display on the Explore Canvas.
Checkout saves the query as a User View entity in discover. This view is an internal view only (not accessible from source module)