Data Sampling and Profiling Options
One may customize the data sampling and profiling request or scheduled action. Please see the technical details for clarity.
- Data Sampling – Enable data sampling and specify number of rows to sample.
- Data Profiling – Enable data profiling and specify number of rows to use in profiling.
- Data Select Method – with a choice of the fast method Top (the default) vs. Random (reservoir sampling when available on the database)
- Profile only objects that are not profiled yet - Enable data profiling only on imported objects which have not been profiled.
- Data Classification – Enable data classification.
- Hide data using Sensitivity Label – The selected sensitivity label will be applied to all new imported objects in the scope (in order to hide them).
In addition, there are inferred sensitivity labels so that when you apply a sensitivity label to an imported object, e.g. a column, then all the imported objects “downstream” in the data flow lineage will be given at least that level of sensitivity as "Sensitivity Label Lineage Proposed". This means you will see automatic sensitive label tagging by inference across the enterprise architecture. As with "Sensitivity Label Data Proposed", the "Sensitivity Label Lineage Proposed" can be rejected, therefore stopping the propagation of inferred sensitivity labels in that data flow direction. Note that the propagation of inferred sensitivity level is also not inferred by any data masking discovered within the ETL/DI/Scrip imports involved in that data flow.
You may override the Data Select Method on a subset of the imported model. However, you must first import the model once, so you have the metadata structure of the data source. Then, you may navigate to the object page of a schema or table or file in the imported model and specify the Data Select Method for that object.
In addition, you may specify a Data Select Query on the model subset.