-
a tPigLoad, to load the data to
be analyzed,
-
a tPigFilterRow, to remove
records with the '404' error from the input flow,
-
a tPigFilterColumns, to select
the columns you want to include in the result data,
-
a tPigAggregate, to count the
number of visits to the website,
-
a tPigSort, to sort the result
data, and
-
a tPigStoreResult, to save the
result to HDFS.