Skip to main content Skip to complementary content

Create the fourth Job

Follow these steps to create the fourth Job, which will analyze the uploaded log file to get the code occurrences in successful calls to the website.

Procedure

  1. Create a new Job and name it D_Pig_Count_Codes to identify its role and execution order among the example Jobs.
  2. Drop the following components from the Palette to the design workspace:
    • a tPigLoad, to load the data to be analyzed,

    • a tPigFilterRow, to remove records with the '404' error from the input flow,

    • a tPigFilterColumns, to select the columns you want to include in the result data,

    • a tPigAggregate, to count the number of visits to the website,

    • a tPigSort, to sort the result data, and

    • a tPigStoreResult, to save the result to HDFS.

  3. Connect these components using Row > Pig Combine connections to form a Pig chain, and label them to better identify their functionality.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!