Skip to main content Skip to complementary content

Defining the columns to be analyzed

The first step in analyzing the content of one or multiple columns is to define the columns to be analyzed. The analysis results provides statistics about the values within each column.

Before you begin

You have defined at least one database connection in the Profiling perspective of Talend Studio.

About this task

When you select to analyze Date columns and run the analysis with the Java engine, the date information is stored in the Talend Studio and in the data mart as regular date/time of format YYYY-MM-DD HH:mm:ss.SSS for date/timestamp and of format HH:mm:ss.SSS for time. The date and time formats are slightly different when you run the analysis with the SQL engine.

Defining the basic column analysis

Procedure

  1. In the DQ repository tree view, expand Data Profiling and right-click Analyses > New analysis.
    Contextual menu of the Analyses node.
    The Create new analysis wizard opens.
  2. Select Column > Basic column analysis and click Create.
  3. In the Name field, enter a name for the current column analysis.
    Information noteImportant:

    Do not use the following special characters in the item names: ~ ! ` # ^ * & \\ / ? : ; \ , . ( ) ¥ ' " « » < >

    These characters are all replaced with "_" in the file system and you may end up creating duplicate items.

  4. Optional: Set column analysis metadata (Purpose, Description and Author) in the corresponding fields and click Next.

Selecting the database columns and setting sample data

Procedure

  1. From the Connection menu, select the connection and click Next.
    Information noteNote: When profiling a DB2 database, if double quotes exist in the column names of a table, the double quotation marks cannot be retrieved when retrieving the column. Therefore, it is recommended not to use double quotes in column names in a DB2 database table.
  2. Select the Run with sample data check box to run the analysis only on the sample dataset in the Limit field.
  3. From the Columns menu, click Select columns. A data preview is displayed.
    You can perform different actions from this menu:
    • Select Columns: open the Column Selection dialog box where you can select the columns to analyze or change the selection of the columns listed in the table. From the open dialog box, you can filter the table or column lists by using the Table filter or Column filter fields respectively.
    • Refresh Data: display the data in the selected columns according to the criteria you set.
    • New Connection: open a wizard and create a connection to the data source from within the editor.

      The Connection field on top of this section lists all the connections created in Talend Studio.

    • n first rows or n random rows: list in the table N first data records from the selected columns or list N random records from the selected columns.
  4. Click Next to set indicators on columns.
  5. From the Indicators menu, click Select Indicators and select the indicators to use for profiling columns.
    If one of the columns you want to analyze is a primary or a foreign key, its data mining type becomes automatically Nominal when you list it in the Analyzed Columns view.
    For more information, see Data mining types.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!