Under the mrKeyStruct table, click the button once to add
one row.
Rename that row to word_mr. This is the key part of the key/value pair to be used
by the Map/Reduce program being created. In the map method, you need to write mrKey.word_mr to represent the keys to be
outputted to a reducer.
Under the mrValueStruct table, click the button once to add
one row.
Rename that row to count_mr. This is the value part of the above-mentioned
key/value pair. In the map method, you need to write mrValue.count_mr to represent the values to be outputted to a
reducer.
Click the button next to Edit schema to open the schema
editor.
On the side of the schema of tJavaMR, click the
button to add two columns and name them to word_output and count_output,
respectively. This defines the structure of the data to be outputted.
In the Type column, select Integer for count_output.
In the Map code editing field, edit the body of the map method. In this
example, the code is as follows:
String line = value.record;
java.util.StringTokenizer tokenizer = new java.util.StringTokenizer(line);
while(tokenizer.hasMoreTokens()) {
mrKey.word_mr = tokenizer.nextToken().toUpperCase();
mrValue.count_mr = 1;
output.collect(mrKey, mrValue);
}
This method is used to split the input data into
words, change each word to upper case and create and output key/value pairs such as
(HELLO, 1) and (WORLD, 1) to the reducer.
Note that at runtime, these pairs are
automatically shuffled and sorted to take the form of (key, list of values) before being process by the reduce
method.
In the Reduce code editing field, edit the body of the reduce method. In
this example, the code is as follows:
This reduce method is used to make the sum of the
values of the list in each (key, list of
values) pair and map the results to the columns of the output
schema.
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!