Defining data lineage with Cloudera Navigator
If you are using Cloudera V5.5+ to run your MapReduce or Apache Spark Batch Jobs, you can make use of Cloudera Navigator to trace the lineage of given data flow to discover how this data flow was generated by a Job.
This lineage includes the components used in this Job and the schema changes between the components.
This type of Job is available only if you have subscribed to any Talend product with Big Data or to Talend Data Fabric.
Procedure
With this option activated, you need to set the following parameters:
Results
When you run this Job, the lineage will be automatically generated in Cloudera Navigator.
When the execution of the Job is done, perform a search in Cloudera Navigator for the data written by this Job and see the lineage of this data in Cloudera Navigator.