Explain the WordCount implementation via Hadoop framework ?

We will count the words in all the input file flow as below:
=> input
Assume there are two files each having a sentence
Hello World Hello World (In file 1)
Hello World Hello World (In file 2)
=> Mapper : There would be each mapper for the a file
For the given sample input the first map output:
< Hello, 1>
< World, 1>
< Hello, 1>
< World, 1>
The second map output:
< Hello, 1>
< World, 1>
< Hello, 1>
< World, 1>
=> Combiner/Sorting (This is done for each individual map)
So output looks like this
The output of the first map:
< Hello, 2>
< World, 2>
The output of the second map:
< Hello, 2>
< World, 2>
=> Reducer :
It sums up the above output and generates the output as below
< Hello, 4>
< World, 4>
=> Output
Final output would look like
Hello 4 times
World 4 times

0 comments: