Read the log file to be analyzed through the Pig chain
Procedure
Double-click the tPigLoad component
to open its Basic settings view.
Click the Property Type list box and select Repository, and then click the [...] button to open the Repository Content dialog box to use a centralized HDFS connection.
Select the HDFS connection defined for connecting to the HDFS system and click
OK.
All the connection details are automatically filled in the respective
fields.
Select the generic schema of access_log from the Repository tree view and then drag
and drop it onto this component to apply the schema.
From the Load function list, select
PigStorage, and fill the Input file URI field with the file path
defined in the previous Job, /user/hdp/weblog/access_log/out.log in this
example.
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!