Skip to main content Skip to complementary content

Read the log file to be analyzed through the Pig chain

Procedure

  1. Double-click the tPigLoad component to open its Basic settings view.
  2. Click the Property Type list box and select Repository, and then click the [...] button to open the Repository Content dialog box to use a centralized HDFS connection.
  3. Select the HDFS connection defined for connecting to the HDFS system and click OK.

    All the connection details are automatically filled in the respective fields.

  4. Select the generic schema of access_log from the Repository tree view and then drag and drop it onto this component to apply the schema.
  5. From the Load function list, select PigStorage, and fill the Input file URI field with the file path defined in the previous Job, /user/hdp/weblog/access_log/out.log in this example.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!