Data Cataloging
Cataloging your data stores, reports, etc., is an import activity to ensure that potential users of the data can understand what is available, what it means, how it may (even should) be used, and what to expect as a result. Talend Data Catalog offers a number of features to support data cataloging and provide easy access to that information.
Characteristics | Usage / Best Practices | |
Name and Business Definition | The Name and Business Definition are the user understandable name and description for an object. They are managed by the repository and may be determined through many means. |
|
Mapped Documentation | Semantic Mapping from a more conceptual (e.g., glossary terms, data model elements, glossary term, custom model objects) to other more physical object using links in a semantic mapping. These are "Defines / Is Defined by” relationships. Semantic mapping is very flexible and may be used between nearly all objects in the repository. | Semantic Mapping is the primary tool for manual and crowd sourced documentation of objects, and the primary method to define the semantic lineage, definition lookup and Name and Business Definition inference for an object. In particular:
|
Semantic Relationships | Semantic Relationships are custom relationships with the semantic flow property set to true | Documentation may be determined via the semantic flow. If a relationship is given semantic meaning, then any data element related to a defined object in this fashion is effectively documented as well. Thus, documenting one object (e.g., Term) with a large number of these semantic relationships defined (e.g., Is Defined By), allows the product to infer the documentation for all other data elements related in this fashion. |
Semantic Connection Stitching | Semantic Connection from imported data model to imported database model | Documentation may be determined via the semantic stitching of semantic connections.. Any such semantic stitching relationship is said to represent equivalent meaning and thus any data element stitched in this fashion is effectively documented as well. This is a uncommon situation, but is common enough with data model tool imports which then may be semantically stitched to an underlying database model, inported directly from the data base. Thus, documenting one object (imported from the data model) may automatically document the stitched database columns. |
Inferred Documentation | Inferred Documentation from other data elements in the pass-through data flow. | Inferred Documentation may be determined via the data flow based upon the understanding that when two data elements which have ETL/DI or other data flow lineage where there is no transformation (i.e., where B is derived from A without any changes), then they likely mean the same thing. Thus, because the details of lineage are accurately represented, including the feature level transformations, documenting one data element in the pass-through lineage allows the product to infer the documentation for all other data elements in the lineage. |
Data Classification | Data Classification by type that may be made automatically on harvesting metadata. Data classes are based upon a pool, defined repository-wide, e.g., Social Security Number or Gender. This pool includes a unique name for the type and either a list of valid value (Gender) or a syntactical rule set (Social Security Number) used to determine data classes for objects after sampling the data during harvesting. or metadata search criteria (Metadata Query Language (MQL)) used to determine data classes for objects based upon an objects metadata, e.g. for Maiden Name. | Data Classification is the primary tool for automatically typing and hiding large numbers of objects. In particular:
|
Documentation Properties | Other properties specific to the type of an object that are consider documentation properties (like “Logical Name”, “Business Name”, “Comment” or “Description) and were harvested (not edited) properties. The properties are unique to the particular category of metadata, e.g., fields vs. columns vs. tables vs. entities, and often unique to the specific metadata source. They are not editable, but harvested. | Object specific properties is an uncommon but sometime important tool for tagging/documenting and analysis. In particular:
|
Other Properties | Other properties specific to the type of an object. The properties are unique to the particular category of metadata, e.g., fields vs. columns vs. tables vs. entities. | Object specific properties is an uncommon but sometime important tool for tagging/documenting and analysis. In particular:
|
Custom Attributes | Any number of custom defined attributes may be defined to be possible associations on a category or type of object. Once done, you may use these for tagging/documenting and analysis. | Custom attributes are a very common tool for tagging / documenting objects and in analysis and tracking. In particular: Editing – One may edit these properties on the object’s object page or when browsing in grid modeor by exporting to .csv format, editing and re-importing that file.
|
Comments | Discussion and other general comments provided by all users as free-form text entries that are associated with a given object and also identified by author and time stamp. They may be used for general annotations, discussions and notifications. Please see Curation for more precise and specific types of Comments. | Comment is a very common tool for annotating / documenting objects. However, as it is entirely free form and more involved than a simple Label, it is general less used in analysis and tracking. In particular:
|
Curation | Just as with Comments, Curation (Certifications, Endorsements and Warnings) are free-form text entries that are associated with a given object and also identified by author and time stamp. However, they have specific meaning beyond the general purpose comment. They may be used for specific annotations and notifications and curation impacts search results and selection lists. | Curation is a very common tool for annotating / documenting objects. It is entirely free form, but is types into the three types (Certification, Endorsements and Warnings). In particular:
|
Labels | Meta tags which may be associated with objects. They are very simple, quick, free-form meta-tags that anyone may place on an object. There is a single namespace pool for Labels that is Talend Data Catalog -wide, and thus shared across the entire repository environment. | Labels is a very common tool for annotating / documenting objects. However, as it is entirely free form just as a simple label, it is general less used in analysis and tracking. In particular:
|
Collections | Groups of objects of any type which may be collected together, like a shopping cart. Anyone may create a Collection and keep it private, or you may share these with others. | Collections is mostly a simple way to corral a set of objects of any kind. Commonly these will include to-do list, assignments (shared with others), collections for later analysis or completeness studies, etc. In particular:
|