mrjob workshop


First Job

First, create a file called wordcount.py in your current folder with this as the contents:

from mrjob.job import MRJob


class WordCount(MRJob):

    def mapper(self, _, line):
        yield "words", len(line.split())

    def reducer(self, key, values):
        yield key, sum(values)


if __name__ == '__main__':
    WordCount.run()

And second, create a file called words.txt in your current folder with this as the contents:

Lorem ipsum dolor sit amet, vis posse concludaturque no, et affert equidem est, per modus partem equidem et. Vim cu summo blandit, errem invidunt intellegat an nec. Et mei tantas percipit. In malis signiferumque sed.

Finally, lets run the job:

$ python wordcount.py words.txt

Check the output, you should get something like:

...
"words" 35
...

Exercises