User Guide
Information note
Important Disclaimer
Note that some of the features detailed in this document may not apply and/or be available for the particular edition/version you are using.
Metadata Overview
Talend Data Catalog provides a comprehensive and well-integrated set of Metadata Management (MM), Data Cataloging (DC) and Data Governance (DG) solutions supporting on premise, cloud based or hybrid Enterprise Architectures ranging from the classic Data Warehouse to the latest Data Lakes and Data Vaults:
- Integrated Solutions: Metadata Management, Data Cataloging, and Data Governance
- Multi-Deployments: On Premise, Cloud, or hybrid (Cloud with on premise harvesting)
- Multi-Architectures from the Data Warehouse to the new Data Lake / Data Vault
- Multi-Vendors from any Data Integration and Business Intelligence tools
- Multi-Technologies from files and SQL based RDBMS, to the new NoSQL, JSON, Avro, Parquet, XML and Hadoop big data technologies, REST API Data Services
- Multi-Storage File Systems: Data Cataloging by file crawling over Linux/Windows, Hadoop HDFS, Amazon S3, Azure Blob Storage, OpenStack Swift, Apache Kafka, etc.
- Multi-Configurations with Change Management and Incremental Metadata Harvesting, Comparison, Version and Configuration Management and automatic stitching
- Multiple integrated tools for Search, Data Flow and Semantic Lineage, Data Modeling, Data Mapping, Active Data Governance (generation of self-service DI and BI), Multi-Vendor BI Web Portal
- Data classification, both automatic and manual and both metadata and data detected.
- Fully customizable metamodel for custom modeling
Talend Data Catalog provides solutions for a full range of users:
- Most business end users in need of a multi-vendor Business Intelligence (BI) web Portal with quick access to the right report understand its content with proper business definitions from the enterprise glossary.
- Advanced business users and compliance officers looking for information traceability (data lineage) and data privacy for General Data Protection Regulation (GDPR), Sarbanes-Oxley (SOX) regulations, and more.
- Data stewards, data modelers, and data quality experts working on enterprise data standardization, common vocabulary, data modeling and business rules with powerful tools like the glossary, semantic mapper, data modeler.
- Data analysts in need of self-service data integration, preparation, and business intelligence using the data mapping design tool with active data governance (forward engineering) to their actual DI/BI tools.
- IT engineers, data scientists, data Integration and business intelligence developers looking complex multi-vendor and multi-architecture end to end linage in great details down the design level information of each tool with support for change impact with full detailed version and configuration management.
Features
Metadata Harvesting
- Data Stores
- Databases (Oracle, SQL Server, Teradata, IBM DB2, PostgreSQL, MySQL, AWS Redshift, GreenPlum, Netezza, SAP HANA, etc.)
- Big Data (Hadoop Hive, HCatalog, Google Big query, etc.)
- NoSQL (Cassandra, HBase, MarkLogic, MongoDB, etc.)
- Flat Files (CSV, XLSX)
- Hierarchical Files (JSON, Avro, Parquet, XML, XML XSD, etc.)
- File Systems ((Linux/Windows)
- Data Lake and Cloud Hadoop HDFS, Amazon S3, etc.)
- Data Services (Open API, etc.)
- Metadata Stores
- Data Modeling Tools (Erwin, ER/Studio, PowerDesigner, etc.)
- Metadata Management (Atlas, Navigator, etc.)
- Semantic Web Ontology (OWL/RDF)
- Data Integration
- DI/ETL Scripts (Oracle PL/SQL, Teradata BTEQ/FastLoad/BulkLoad, Hadoop HiveQL, Sqoop, SAS code, etc.)
- DI/ETL Tools (Informatica PowerCenter, IBM DataStage, Oracle ODI, Microsoft SSIS, SQP Data Services, SAS DI, Talend, etc.)
- Business Intelligence (SAP BusinessObjects, IBM Cognos, Microsoft SSAS/SSRS, Azure PowerBI, Oracle OBIEE, Microstrategy, Qlik, Tableau, ThoughtSpot, TIBCO Spotfire, etc.)
- Business Applications (SAP Business Suite, SAP Business Warehouse, Salesforce, etc.)
Metadata Management (MM)
- Configuration Manager (with automatic metadata stitching, and Enterprise Architecture diagramming)
- Metadata Search and Worksheets (metadata driven pre and post filters, semantic search language)
- Metadata Browser (hierarchical metadata browsers with custom metadata profiles per tool/technology)
- Metadata reporting capabilities where both search and browse end up to the same reporting page which is also directly available at WORKSHEETS > Manage. With a tabular and bulk editing capabilities
- Data Model Visualizer and modeler (fully editable ER Diagrams)
- Enterprise architecture diagramming and editing (fully editable architecture diagram)
- Data Flow Lineage and Impact Analyzer including data flow lineage and impact analysis down to the feature level, along with data vs control flow, data vs. semantic flow, highlight path and dynamic scoping
- Multi-Configuration Management (multi configurations for different enterprise architectures and groups)
- Multi-Version Management (efficient automatic incremental harvesting, with model history/SOX compliance)
- Metadata Comparator (comparison with previous versions for the impact of change)
Data Governance (DG)
- Glossary (with customizable workflow automation)
- Semantic Mapper (search driven, auto map, and multi-levels from glossaries to data stores via design models)
- Semantic Lineage Analyzer (term usage, and automatic glossary definition on data pass through)
- Local Documentation (quick in place editing of business names and descriptions while browsing harvested data stores)
- Glossary term classification (quick in place semantic linking while browsing harvested data stores, DI jobs, and BI reports)
- Data Tagging (applying reusable Labels available in search)
- Comments and Reviews (collecting business user feedback and managing reviews)
Data Cataloging (DC)
- File System Crawling (file type auto-detection, partitioning auto-detection)
- Data Profiling (from data sampling to full data profiling with statistical results)
- Semantic Discovery (data classes, patterns/lists machine learning)
- Relationship Discovery (data-detected matching data classes and metadata-detected inferred from usage in DI, BI, SQL, etc.)
- Social Curation (endorsement, warnings, certifications with impact on search)
- Data Modeling
- Data Store Documenter (automatic reverse engineering of naming standards with supervised machine learning)
- Data Store Modeler (with full data model diagram editing)
- Data Store Designer (new data store specifications and design)
- Data classification based on both data-detected and metadata-detected classification
Data Mapping
- Data Mapper (from business user data mapping specifications to design for bulk and feature/SQL with joins/filters/lookups)
- Metadata Applications
- BI Web Portal (Multi-vendor BI Web Portal with bi-directional integration, and glossary generation)
Active Data Governance (Forward Engineering)
- Data Modeling Tools (Erwin, ER/Studio, PowerDesigner, etc.)
- Data Integration Scripts (PL/SQL, BTEQ, HiveQL, etc.)
- Data Integration (to self-service / data prep tools)
- Business Intelligence (to self-service like Tableau or design layers like BO Universes.)
Administration, Customization, & Extensions
- Custom Attributes (metamodel extensions) (MyCompanyCertificationLevel, etc.)
- Customizable UI (menus, widget layout, etc.)
- REST API (glossary lookups, linage trace, automatic harvesting, search, browse, update, etc.)
Object Watchers and Email Notification
- Provides watcher capabilities at the server level allowing user to effectively subscribe to objects and be notified of changes
- There is an adjustable frequency of notification from near real-time to daily.