Module overview
The following functional modules provide a powerful and flexible range of data preparation, governance, and management functionality.
Catalog
Interactive marketplace dashboard
Catalog is an interactive marketplace dashboard providing immediate insight and actions for search and filterable entities and QVDs across the data ecosystem. It scores entities against operational, quality, and popularity or size KPIs providing immediate insight and actions for filterable entities across the data ecosystem. Users access meaningful information such as lineage, joinable entities, sample data, creation date, source, connection, KPI scoring detail, and field profile statistics with one click. Shop for data functionality provides search, browse, preview, and explore features targeting specific entity collections.
Source
Data ingest, validation, quality, and profiling
Source enables administrator level users to deploy and manipulate enterprise data assets across clustered nodes with ease and efficiency through the powerful ingest framework of Qlik Catalog. Data analysts build custom environments to validate, profile, and register metadata in HCatalog, ensuring clean, executable datasets upon ingest of data records. Source provides guided wizards to build metadata-driven environments leveraging JDBC technology for relational databases, File Definition Language (FDL) for flat files, COBOL copybooks for mainframe data. An XML utility stages users through ingest and validation of XSDs for loading XML data, and JSON files validate with a hierarchical extraction utility. The data sourcing capabilities of Qlik Catalog can be configured to be scheduled for automatic execution in production environments through API scripting / integration.
Discover
Interactive query and metadata management
Discover enables analysts to view and explore all data sources and metadata to which they are granted access. Users can search, browse, create, delete, and edit properties at source, entity, and field levels, enabling collaborative data curation and governance. Datasets are logical collections of data assets in Qlik Catalog that can be created by any user to be transformed, published, or selected and retrieved through query language. Users can run most Hive or Impala commands from within Qlik Catalog or create custom views using a familiar shopping cart environment for visibility and control over data sources.
Prepare
Data transformation
Prepare is a simple and intuitive GUI that enables users to customize database tables into filtered and cleansed data sets through transform controllers: transform, filter, join, aggregate, router, sort, union, change data capture (CDC). The metadata and profiling statistics of Qlik Catalog guide users throughout the process, including end-to-end validation of the dataflow. Users can envision and graphically manipulate custom datasets to meet a wide variety of targeted data preparation and analytics requirements. Users can build and apply Pig functions to entities through the custom expression builder .
Publish
Deliver datasets
Publish enables one-time or recurring replication of datasets across enterprise cluster environments. Administrators define Publish targets, file format and execute or schedule on a one time or recurrent basis. Datasets can be published to various destination types including Hadoop (and Hive), local file, HDFS, FTP, Amazon Web Service (AWS) Simple Storage Service (S3), or any RDBMS supporting any protocol via Qlik Catalog Open Connector scripting. Publish provides the ability to define file type, field and record delimiters, header information, partition merge options, data obfuscation techniques, and environmental properties for openconnector.
Security
User and group administration
Security provides a console to administer role-based access permissions to users and assigns both users and data to groups. Administrators can create users and designate group access levels and permissions to entities. Qlik Catalog leverages enterprise security technologies such as Active Directory identity services and domain management and LDAP to query and dynamically synchronize active groups and personnel. Qlik Catalog can also integrate with access control policies defined in Hadoop tools such as Ranger, Sentry, HDFS, and HDFS with transparent data encryption. Support for impersonation and enforcement of HDFS access control methods enable incisive control and integration of existing security policies.
Ops
Ops provides a quick view of data quality and usage metrics for the most recent ingest activity. Operations display for recent data loads with standard associated metadata including Source, Entity, Job Status, Delivery Time, Start Time, End Time, Record Count, Good Records, Bad Records, Ugly Records, Filtered Records, and Last Update Time.