Skip to main content

Data Cataloging

Cataloging your data stores, reports, etc., is an import activity to ensure that potential users of the data can understand what is available, what it means, how it may (even should) be used, and what to expect as a result. Talend Data Catalog offers a number of features to support data cataloging and provide easy access to that information.

Data Cataloging Type

Characteristics

Usage / Best Practices

Name and Business Definition

The Name and Business Definition are the user understandable name and description for an object. They are managed by the repository and may be determined through many means.

  • They can be entered locally on an imported object by simply entering or updating a value (as a Name and Business Definition pair). In this case, this local documentation in the Business Documentation section of the object pageis presented as the Business Name and Definition in the Documentation section of the object page.
  • In addition, the Name and Business Definition may be Defined By one or more business glossary terms. In this case, this term documentation is presented in the Term Documentation section of the object page.as Name and Business Definition pairs, one for each term with a Defined By relationships
  • Alternatively, the Name and Business Definition may be determined from a semantic mapping from one or more objects, such as imported data model objects, business glossary terms, etc. In this case, this mapped documentation is presented in the Mapped Documentation section of the object page.as Name and Business Definition pairs, one for each object with a semantic mapping link.
  • Alternatively, the Name and Business Definition may be Inferred from other objects which are considered semantically equivalent based upon a trace through the pass-through data flow and other semantic flow. If there are objects with local, Defined By, semantic mapping documentation, then those will all be presented in the Inferred Documentation section of the object page.as Name and Business Definition pairs, one for each example.
  • Alternatively, the Name and Business Definition may be reused from the imported documentation, e.g., logical name or description fields imported from sources like from data modeling or BI sources. If there are objects with local, Defined By, semantic mapping documentation, or imported documentation themselves, then those will all be presented in the Inferred Documentation section of the object page.as Name and Business Definition pairs, one for each example.
  • Finally, the Name and Business Definition may be Searched for within the various glossaries in a configuration. These are presented in the Search Documentation.

Mapped Documentation

Semantic Mapping from a more conceptual (e.g., glossary terms, data model elements, glossary term, custom model objects) to other more physical object using links in a semantic mapping. These are "Defines / Is Defined by” relationships. Semantic mapping is very flexible and may be used between nearly all objects in the repository.

Semantic Mapping is the primary tool for manual and crowd sourced documentation of objects, and the primary method to define the semantic lineage, definition lookup and Name and Business Definition inference for an object. In particular:

Semantic Relationships

Semantic Relationships are custom relationships with the semantic flow property set to true

Documentation may be determined via the semantic flow. If a relationship is given semantic meaning, then any data element related to a defined object in this fashion is effectively documented as well. Thus, documenting one object (e.g., Term) with a large number of these semantic relationships defined (e.g., Is Defined By), allows the product to infer the documentation for all other data elements related in this fashion.

Semantic Connection Stitching

Semantic Connection from imported data model to imported database model

Documentation may be determined via the semantic stitching of semantic connections.. Any such semantic stitching relationship is said to represent equivalent meaning and thus any data element stitched in this fashion is effectively documented as well. This is a uncommon situation, but is common enough with data model tool imports which then may be semantically stitched to an underlying database model, inported directly from the data base. Thus, documenting one object (imported from the data model) may automatically document the stitched database columns.

Inferred Documentation

Inferred Documentation from other data elements in the pass-through data flow.

Inferred Documentation may be determined via the data flow based upon the understanding that when two data elements which have ETL/DI or other data flow lineage where there is no transformation (i.e., where B is derived from A without any changes), then they likely mean the same thing. Thus, because the details of lineage are accurately represented, including the feature level transformations, documenting one data element in the pass-through lineage allows the product to infer the documentation for all other data elements in the lineage.

Data Classification

Data Classification by type that may be made automatically on harvesting metadata. Data classes are based upon a pool, defined repository-wide, e.g., Social Security Number or Gender. This pool includes a unique name for the type and either a list of valid value (Gender) or a syntactical rule set (Social Security Number) used to determine data classes for objects after sampling the data during harvesting. or metadata search criteria (Metadata Query Language (MQL)) used to determine data classes for objects based upon an objects metadata, e.g. for Maiden Name.

Data Classification is the primary tool for automatically typing and hiding large numbers of objects. In particular:

  • Auto-tagging In addition, as part of the harvesting and data profiling process, Talend Data Catalog will suggest data class assignments that may be confirmed and made permanent.
  • Data class Discovery – You may manually assign Data classes to a object from the element’s object page or when browsing in grid mode, identifying those elements to be a certain type,
  • Data class Analysis – With these Data class assignments, one may categorize by data class and thus identify, sort, operate on different objects all of that same type.
  • Name and Business Definition Inference –You may associate a term with a data class and thus the Name and Business Definition may be inferred by all those data elements which are assigned that Data class, Just as with term classification and semantic relationships and mappings.
  • You may analyze and, report on the Data class assignments via worksheet columns of that name and update in bulk the locally edited values,
  • You may associate data classes with glossary terminology thus leveraging the semantic relationships defined among terms and models.

Documentation Properties

Other properties specific to the type of an object that are consider documentation properties (like “Logical Name”, “Business Name”, “Comment” or “Description) and were harvested (not edited) properties. The properties are unique to the particular category of metadata, e.g., fields vs. columns vs. tables vs. entities, and often unique to the specific metadata source. They are not editable, but harvested.

Object specific properties is an uncommon but sometime important tool for tagging/documenting and analysis. In particular:

  • This documentation will simply be picked up for that particular metadata category and that metadata source (determined by the bridge).
  • This documentation will only be proposed if populated in the original metadata source tool and only if the import parameters specified that they be harvested.

Other Properties

Other properties specific to the type of an object. The properties are unique to the particular category of metadata, e.g., fields vs. columns vs. tables vs. entities.

Object specific properties is an uncommon but sometime important tool for tagging/documenting and analysis. In particular:

Custom Attributes

Any number of custom defined attributes may be defined to be possible associations on a category or type of object. Once done, you may use these for tagging/documenting and analysis.

Custom attributes are a very common tool for tagging / documenting objects and in analysis and tracking. In particular:

Editing – One may edit these properties on the object’s object page or when browsing in grid modeor by exporting to .csv format, editing and re-importing that file.

  • Analysis – You may analyze and, report on the Custom Attributes assignments via worksheet columns of that name and update in bulk the locally edited values.
  • Completeness – You may analyze as above either by either by value (search or matching) or by existing or lack of the Custom Attribute assignment.
  • Well Managed Types – There is a very robust set of Custom Attribute Types which may be used to provide controlled documentation (e.g., enumerations) and very colorful documentation (HTML text editor)

Comments

Discussion and other general comments provided by all users as free-form text entries that are associated with a given object and also identified by author and time stamp. They may be used for general annotations, discussions and notifications. Please see Curation for more precise and specific types of Comments.

Comment is a very common tool for annotating / documenting objects. However, as it is entirely free form and more involved than a simple Label, it is general less used in analysis and tracking. In particular:

  • Editing – One may edit these Comments on the object’s object page or when browsing in grid mode.
  • Analysis – You may analyze and report on the text (search) inside Comments worksheet filters and create additional ones in bulk.
  • Anyone May Comment – There are no permission restrictions on Comments, and thus these discussions are open.
  • Completeness – You may analyze by existence or lack of any Comments assignment.
  • Discussion – As any number of Comments may be assigned and author and time-stamp information are captured, they are ideal for on-line discussions.

Curation

Just as with Comments, Curation (Certifications, Endorsements and Warnings) are free-form text entries that are associated with a given object and also identified by author and time stamp. However, they have specific meaning beyond the general purpose comment. They may be used for specific annotations and notifications and curation impacts search results and selection lists.

Curation is a very common tool for annotating / documenting objects. It is entirely free form, but is types into the three types (Certification, Endorsements and Warnings). In particular:

  • Editing – One may edit these Curations on the object’s object page or when browsing in grid mode.
  • Anyone May Comment – There are no permission restrictions on comments, and thus these discussions are open.
  • Certification Rules – One must have special permission to Certify an object and there can only be one Certification. It is free-form text, though.
  • Analysis – You may analyze and, report on the text (search) inside Curation (of any particular type) worksheet filters and create additional ones in bulk.
  • Completeness – You may analyze by existence or lack of any Curation (of any particular type) assignment.
  • Discussion – As any number of Endorsements and Warnings may be assigned and author and time-stamp information are captured, they are sometimes use to augment on-line discussions.

Labels

Meta tags which may be associated with objects. They are very simple, quick, free-form meta-tags that anyone may place on an object. There is a single namespace pool for Labels that is Talend Data Catalog -wide, and thus shared across the entire repository environment.

Labels is a very common tool for annotating / documenting objects. However, as it is entirely free form just as a simple label, it is general less used in analysis and tracking. In particular:

  • Assignment – One may assign these Labels on the object’s object page or when browsing in grid mode.
  • Analysis – You may analyze and report on the Labels as a whole (not text search) inside worksheet filters and create additional ones in bulk.
  • Anyone May Label – There are no permission restrictions on Labels, and thus these discussions are open.
  • Completeness – You may analyze by existence or lack of any Label assignment.
  • Tagging – As any number of Labels may be assigned and there is little management overhead, they are a quick and light-weight tagging mechanism, though they do not have the different managed values which Custom Attributes may have.

Collections

Groups of objects of any type which may be collected together, like a shopping cart. Anyone may create a Collection and keep it private, or you may share these with others.

Collections is mostly a simple way to corral a set of objects of any kind. Commonly these will include to-do list, assignments (shared with others), collections for later analysis or completeness studies, etc. In particular:

  • Assignment – One may assign any number of objects to one or more Collections on the element’s object page or when browsing in grid mode.
  • Analysis – You may analyze and report on the Collections as a whole (not text search) inside worksheet filters and create additional ones in bulk.
  • Anyone May Collect – There are no permission restrictions on Labels, and thus these discussions are open.
  • Completeness – You define a worksheet around a Collection to determine what has been completed in the set of objects.
  • Tagging – Just as with Labels, one may use a Collection as a tagging mechanism.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!