Skip to main content

Explore

Explore enables the creation of custom Hive or Postgres distribution service views as internal User views and tables accessible in the discover module.

Example of an explore workflow; 2 source entities , explore script controller , created view.

Access Explore screen through My Cart or from Datasets in Discover.

Access Explore from Datasets through icon gear display available actions gear dropdown

Select explore from available options dropdown

The following example details the Explore process through Add To Cart.

To select entities individually, mark the check box next to the entity or entities to be included, and select Add To Cart.

Select entities individually
Entities can also be added individually to the cart.

The entities populate in My Cart in the top navigation bar. Select Take Action from My Cart dropdown and choose Explore from the dropdown box. All Dataset objects will display on the canvas. Source entities can be selected individually and staged directly to My Cart or bucketed into a Dataset, then added to My Cart.

To add more entities to the canvas, select Add Source from the canvas. Add Source wizard stages the user through source and entity selection for inclusion on the Explorecanvas,

Add more entities to the explore canvas

Modifying entities for Explore canvas

Explore tile icons
icon description
icon minimize tab Minimize tab
icon edit object Edit object
icon delete Delete object
icon source information Source name
icon sample data Sample data
icon profile data Profile data

Remove fields from an entity

Users can create a custom subset view of an entity by deleting fields from a source entity. This is a straightforward operation and requires no SQL scripting. Select the icon view and edit(view edit) icon to open a field panel (Edit Source).

Select the subset of field to include, select Update.

The source entity now displays only the fields that were selected.

Uncheck fields to exclude from the updated view
Unselect fields to exclude them from target entity view

Before defining and adding a target entity, expand the controller icon to review Input, Script, and Output for the Explore flow.

INPUT displays field name and data type for the selected fields.

SCRIPT allows the user to define conditions for the view in the Script section with query language commands. Users proficient with query languages can write custom queries.

OUTPUT shows Field Name and Data Type for the fields as they will display. Output fields can be renamed.

INPUT section example

FIELD NAME and DATA TYPE display for the selected fields.

Controller input section displays field names and their data types.

SCRIPT query language section example

Define conditions for the view in the script section with Hive Query Language (HQL) commands. The example below shows a Join on the two source entities. Users proficient with query languages can insert custom queries.

Script section example

OUTPUT

  • OUTPUT FIELD and DATA TYPE display for the fields as they will display in the created view
  • OUTPUT fields can be renamed.

Selecting Add Target to define Source Hierarchy, Source System Name(source name), Entity Info (Name), and Groups access.

Source System Name can be edited, the name can be removed and replaced. Define or select the Source Name first and Source Hierarchy dropdown will populate accordingly.

Explore parameters

Select the pencil icon in the upper right of the explore screen to make changes to execution parameters any time before execution. A modal will display with execution options. (These parameters are not supported single node environments.)

Note: Note that Tez is the default engine for HDP3 (Hive-on-MapReduce is no longer supported on HDP3) and MapReduce will otherwise be the default engine. CDH 6 does not support prepare-on-SPARK or prepare-on-TEZ.

View Name Target entity name (becomes the user table or view name)

Query Engine: Options are driven through core_env.property explore.available.engines=MAPREDUCE,SPARK,TEZ

The engine names as presented in the modal (HIVE ON MAPREDUCE, HIVE ON SPARK, HIVE ON TEZ) clarify that the engines query Hive distribution tables.

Hcat: Hcatalog is a table storage management tool that exposes the Hive metastore (the system catalog containing metadata about Hive -- to create tables, columns, and partitions). The Hcat table option provides a materialized table, a snapshot that will not change if the original explore entities change. The Hcat view option re-queries the entities with a select statement.

Available parameters for explore queries

Once the target entity (user view) has been added to the canvas, Validate then Execute the query.

Query Results display on the Explore Canvas.

Upon checkout query results display

Checkout saves the query as a User view entity in discover. This view is an internal view only (not accessible from source module)

Checkout saves the query
Query is saved as an internal user view upon checkout.