Skip to main content Skip to complementary content

Defining the columns to be analyzed in a file

The first step in analyzing the content of one or multiple columns is to define the columns to be analyzed. The analysis results provides statistics about the values within each column.

When you select to analyze Date columns and run the analysis with the Java engine, the date information is stored in the Talend Studio and in the data mart as regular date/time of format YYYY-MM-DD HH:mm:ss.SSS for date/timestamp and of format HH:mm:ss.SSS for time. The date and time formats are slightly different when you run the analysis with the SQL engine.

Before you begin, you have defined at least one connection to a delimited file the Profiling perspective of Talend Studio.

Defining the column analysis

Procedure

  1. In the DQ repository tree view, expand Data Profiling and right-click Analyses > New analysis.
    Contextual menu of the Analyses node.
    The Create new analysis wizard opens.
  2. Select Column > Basic column analysis and click Create.
  3. In the Name field, enter a name for the current column analysis.
    Information noteImportant:

    Do not use the following special characters in the item names: ~ ! ` # ^ * & \\ / ? : ; \ , . ( ) ¥ ' " « » < >

    These characters are all replaced with "_" in the file system and you may end up creating duplicate items.

  4. Set column analysis metadata (Purpose, Description and Author) in the corresponding fields and click Next.

Selecting the file columns and setting sample data

Procedure

  1. From the Connection menu, select the file from the Connection drop-down list.
    In this example, you want to analyze the id, first_name and age columns from the selected connection.
  2. To create a connection, click Add in the top-right corner.
  3. If needed, define a filter in the Where section to filter the data on which to run the analysis.
  4. Click Next.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!