Skip to main content Skip to complementary content

Control Flow

Generally, lineage is represented as a “flow”, either of data as part of a data movement and possibly transformation process, or of “meaning” as in from a defining object like a glossary term to a defined object like a column. These directions are commonly also associated with analysis of the lineage, hence:

Control flow is lineage that traces from an object used as part of a selection WHERE clause or similar structure that impacts what data is moved but is not itself directly moved to the target. There are two types of control flow:

  • Column control flow where the control flow directly impacts values of column (e.g., lookup)
  • Row control flow where the control flow does not directly impact values of columns (e.g., filters).

It is easy to imagine a common scenario where you trace data impact and your impact trace affects a commonly used (in terms of joins and WHERE clauses) dimension, e.g., the time dimension in the warehouse, mart or otherwise. Just about every report will be using that dimension in some way, and thus the impact lineage is basically everything. In this case the diagram size quickly grows out of the capability of your browser to present the lineage let alone navigate and analyze it.

For this and other similar reasons, the same menu as above includes options to limit the lineage.

Talend Data Catalog may be used as an active data catalog, providing:

Control Lineage Option

Description

Delay in Presentation

None

No control flow data impacts are traced

None

Limited

Show only immediate (adjacent) control flow objects

Maybe slow

Complete

All control flow impacts are traced

Likely slow

Steps

  1. Begin a lineage trace.
  2. In Control Flow, you may:
    • Click None to hide any object which are only connected via control flow and not show any control flow links.
    • Click Limited to show any objects which are directly connected to the origin object via control flow and show those control flow links.
    • Click Complete to show any objects which are connected via control flow to the origin object and any subsequent objects and show those control flow links.
  3. If Limited control flow display is enabled, then go to the lineage Diagram and click on target elements and the control flow that the target depends upon will appear.

Example

Search for the OnPrem DW.dbo.Customer table and open it.

Go to the Lineage tab and ensure that the Type is DATA FLOW and the View is DIAGRAM.

Information note

There is a red “pin” in the diagram, showing the point of origin, from which lineage is presented. In this case, the Customer table.

Finally, ensure that the Control Flow is NONE:

Click Columns HIDE and select the top checkbox to show all the columns in the Customer table.

Then, expand the Staging DW model to the column table level using the minus sign.

Information note

At this time, the diagram does not contain any control lineage artifacts, as we specified.

Now, specify Control Flow as Limited:

And expand the Staging DW model again to the column level.

Information note

Many new objects, which are not directly connected by data flow links now appear. Selecting Data Flow Settings >Control Flow > Limited shows any control flow related objects which are directly connected to the origin object via control flow.

And we see control lineage as different (dashed) lines.

One must click on an object to see the control lineage.

Click the Dimensional DW.dbo.Customer.ID column.

Information note

And we see control lineage source columns in gray shading.

Now, specify Control Flow as COMPLETE, and expand the Accounting model to the column level and Click the Dimensional DW.dbo.Customer.ID column.

Information note

Even more objects are now shown in the lineage diagram and we also have gray shading when a column that is impacted by control lineage is selected.

Processes (Bottom) Panel with Control Flow

In addition, one may use the Process (Bottom) Panel to see the specific control flow where clauses, lookups and filters.

So, we return to the diagram, but this time we show the Processes Panel at the bottom of the diagram:

Here we have selected two processes, one that has a filter and another which has a where clause. These details are already provided in the Processes Panel without resorting to showing the actual control flow lineage.

If we click on Show lineage details for the CustomerPOInvoiceItem process:

We have:

We see our object’s portion of the trace in the Informatica workflow where it is populated.

Expanding the FILT_Customer filter, we see:

Information note

We see the actual filter condition Expression that controls which rows in the table are used. PaymentType, InvoiceStatus and PurchaseOrderAmout are all a part of the control flow in this case.

If we now return to the Customer.ID output lineage

If we expand the CustomerPaymentDate table:

As we saw before, the data flow lineage is from Customer.ID to CustomerPaymentDate.ID. In addition, we see the where clause showing the control operation “CustomerName = ‘Adjustment’. I.e., it only writes to this field of transactions that are truly general ledger adjustments to correct errors.

If we then click on CustomerPaymentDate.ID:

we see it all put together:

  • Customer.ID column
    • is shaded in blue and thus data flow
    • The Expression for it is a passthrough, so data flow inference of names and definitions occurs
  • CustomerPaymentDate.ID column
    • Is shaded in grey and thus it controls the movement of ID to ID
    • The where clause is there defining the control condition

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!