Module overview
The following functional modules provide a powerful and flexible range of data preparation, governance, and management functionality.
Catalog
Interactive Marketplace Dashboard
Catalog is an interactive Marketplace Dashboard providing immediate insight and actions for searchable/filterable entities and QVDs across the data ecosystem.It scores entities against Operational, Quality, and Popularity or Size KPIs providing immediate insight and actions for filterable entities across the data ecosystem. Users access meaningful information such as Lineage, Joinable Entities, Sample Data, Creation Date, Source, Connection, KPI Scoring Detail, and Field Profile Statistics with one click. Shop-For-Data functionality provides search, browse, preview, and explore features targeting specific entity collections.
Source
Data ingest, validation, quality, and profiling
Source enables Administrator level users to deploy and manipulate enterprise data assets across clustered nodes with ease and efficiency through the powerful ingest framework of Qlik Data Catalyst. Data analysts build custom environments to validate, profile, and register metadata in HCatalog, ensuring clean, executable datasets upon ingest of data records. Source provides guided wizards to build metadata-driven environments leveraging JDBC technology for relational databases, File Definition Language (FDL) for flat files, COBOL copybooks for mainframe data. An XML utility stages users through ingest and validation of XSDs for loading XML data, and JSON files validate with a hierarchical extraction utility. The data sourcing capabilities of Qlik Data Catalyst can be configured as scheduled for automatic execution in production environments.
Discover
Interactive query and metadata management
Discover enables analysts to view and explore all data sources and metadata to which they are granted access. Users can Search, Browse, Create, Delete, and Edit properties at the Source Hierarchy, Entity, and Field level, enabling collaborative data curation and governance. Datasets are logical collections of data assets in Qlik Data Catalyst that can be created by any user to be transformed, published, or selected and retrieved through query language. Users can run most Hive/Impala commands from within Qlik Data Catalyst or create custom views using a familiar shopping cart environment for visibility and control over data sources.
Prepare
Data transformation
Prepare is a simple and intuitive GUI that enables users to customize database tables into filtered and cleansed data sets through transform commands: Transform, Filter, Join, Aggregate, Router, Sort, Union, change data capture (CDC). The metadata and profiling statistics of Qlik Data Catalyst guide users throughout the process, including end-to-end validation of the dataflow. Users can envision and graphically manipulate custom datasets to meet a wide variety of targeted data preparation and analytics requirements. Users can apply Pig functions to entities through the flexible Custom Expression Builder of Qlik Data Catalyst.
Publish
Deliver datasets
Publish enables one-time or recurring replication of datasets across enterprise cluster environments. Administrators define Publish Targets, file format and execute or schedule on a one time or recurrent basis. Datasets can be published to various destination types including Hadoop (and Hive), local file, HDFS, FTP, Amazon Web Service (AWS) Simple Storage Service (S3), or any RDBMS supporting any protocol via Qlik Data Catalyst Open Connector scripting. Publish provides the ability to define the file type, field and record delimiters, header information, partition merge options, data obfuscation techniques and environmental properties for Open Connector.
Security
User and group administration
Security provides a console to administer role-based access permissions to users and assigns both users and data to groups. Administrators can create users and designate group access levels/permissions to entities. Qlik Data Catalyst leverages enterprise security technologies such as Active Directory identity services and domain management and LDAP to query and dynamically synchronize active groups and personnel. It can also integrate with access control policies defined in Hadoop tools such as Ranger, Sentry, HDFS, and HDFS with Transparent Data Encryption. Support for impersonation and enforcement of HDFS access control methods enable incisive control and integration of existing security policies.
Ops
Ops provides a quick view of data quality and usage metrics for the most recent ingest activity. Operations display for recent data loads with standard associated metadata including Source, Entity, Job Status, Delivery Time, Start Time, End Time, Record Count, Good Records, Bad Records, Ugly Records, Filtered Records, and Last Update Time.