Module overview

The following functional modules provide a powerful and flexible range of data preparation, governance, and management functionality.

Catalog

Interactive Marketplace Dashboard

Catalog is an interactive Marketplace Dashboard providing immediate insight and actions for searchable/filterable entities and QVDs across the data ecosystem.It scores entities against Operational, Quality, and Popularity or Size KPIs providing immediate insight and actions for filterable entities across the data ecosystem. Users access meaningful information such as Lineage, Joinable Entities, Sample Data, Creation Date, Source, Connection, KPI Scoring Detail, and Field Profile Statistics with one click. Shop-For-Data functionality provides search, browse, preview, and explore features targeting specific entity collections.

Source

Data ingest, validation, quality, and profiling

Source enables Administrator level users to deploy and manipulate enterprise data assets across clustered nodes with ease and efficiency through the powerful ingest framework of Qlik Data Catalyst. Data analysts build custom environments to validate, profile, and register metadata in HCatalog, ensuring clean, executable datasets upon ingest of data records. Source provides guided wizards to build metadata-driven environments leveraging JDBC technology for relational databases, File Definition Language (FDL) for flat files, COBOL copybooks for mainframe data. An XML utility stages users through ingest and validation of XSDs for loading XML data, and JSON files validate with a hierarchical extraction utility. The data sourcing capabilities of Qlik Data Catalyst can be configured as scheduled for automatic execution in production environments.

Discover

Interactive query and metadata management

Discover enables analysts to view and explore all data sources and metadata to which they are granted access. Users can Search, Browse, Create, Delete, and Edit properties at the Source Hierarchy, Entity, and Field level, enabling collaborative data curation and governance. Datasets are logical collections of data assets in Qlik Data Catalyst that can be created by any user to be transformed, published, or selected and retrieved through query language. Users can run most Hive/Impala commands from within Qlik Data Catalyst or create custom views using a familiar shopping cart environment for visibility and control over data sources.

Prepare

Data transformation

Prepare is a simple and intuitive GUI that enables users to customize database tables into filtered and cleansed data sets through transform commands: Transform, Filter, Join, Aggregate, Router, Sort, Union, change data capture (CDC). The metadata and profiling statistics of Qlik Data Catalyst guide users throughout the process, including end-to-end validation of the dataflow. Users can envision and graphically manipulate custom datasets to meet a wide variety of targeted data preparation and analytics requirements. Users can apply Pig functions to entities through the flexible Custom Expression Builder of Qlik Data Catalyst.

Publish

Deliver datasets

Publish enables one-time or recurring replication of datasets across enterprise cluster environments. Administrators define Publish Targets, file format and execute or schedule on a one time or recurrent basis. Datasets can be published to various destination types including Hadoop (and Hive), local file, HDFS, FTP, Amazon Web Service (AWS) Simple Storage Service (S3), or any RDBMS supporting any protocol via Qlik Data Catalyst Open Connector scripting. Publish provides the ability to define the file type, field and record delimiters, header information, partition merge options, data obfuscation techniques and environmental properties for Open Connector.

Security

User and group administration

Security provides a console to administer role-based access permissions to users and assigns both users and data to groups. Administrators can create users and designate group access levels/permissions to entities. Qlik Data Catalyst leverages enterprise security technologies such as Active Directory identity services and domain management and LDAP to query and dynamically synchronize active groups and personnel. It can also integrate with access control policies defined in Hadoop tools such as Ranger, Sentry, HDFS, and HDFS with Transparent Data Encryption. Support for impersonation and enforcement of HDFS access control methods enable incisive control and integration of existing security policies.

Ops

Ops provides a quick view of data quality and usage metrics for the most recent ingest activity. Operations display for recent data loads with standard associated metadata including Source, Entity, Job Status, Delivery Time, Start Time, End Time, Record Count, Good Records, Bad Records, Ugly Records, Filtered Records, and Last Update Time.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here