Huisquare / Assignment-1-Hadoop

Home Page:HomePage

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

############### Assignment 1-1: Top K Common Words #####################

Command Format: TopkCommonWords <input_file1> <input_file2> <stopwords> <output_dir>
Example: hadoop jar cm.jar TopkCommonWords commonwords/input/task1-input1.txt commonwords/input/task1-input2.txt commonwords/input/stopwords.txt commonwords/cm_output/
(All the file path in the command is HDFS path)
The output should be stored in a text file commonwords/cm_output/part-r-00000 by default.

Use scripts to compile and submit your codes
$ ./compile_run		# compile and run your code on sample dataset, check the result
$ ./submit		# submit your codes (You can submit multiple times before due time)
These scripts will also be used on marking. So ensure your output format is the same as 'answer.txt'

In this assignment, you should ONLY modify:
-- TopkCommonWords.java

Do NOT add new files. Do NOT define Java packages. The script will compile 'TopkCommonWords.java' as simple java file.

About

HomePage


Languages

Language:Java 79.5%Language:Python 11.4%Language:Shell 9.1%