memory7734 / hadoop-bayes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

$ hadoop jar hadoop.jar ConvertToSequenceFile NBCorpus/Country/ trainSet testSet
18/11/28 21:30:25 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
18/11/28 21:30:25 INFO compress.CodecPool: Got brand-new compressor [.deflate]
18/11/28 21:30:25 INFO compress.CodecPool: Got brand-new compressor [.deflate]
$ ll
总用量 50304
drwxr-xr-x 2 memory7734 memory7734     4096 9月  10 11:58 bin
……
drwxrwxr-x 4 memory7734 memory7734     4096 11月 14 17:05 NBCorpus
……
-rw-r--r-- 1 memory7734 memory7734   613399 11月 28 21:30 testSet
-rw-r--r-- 1 memory7734 memory7734  1750520 11月 28 21:30 trainSet
$ hadoop jar hadoop.jar StatisticsClassAmount trainSet trainSetClassAmount
18/11/28 21:42:29 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
……
	File Input Format Counters 
		Bytes Read=1764204
	File Output Format Counters 
		Bytes Written=196
$ hdfs dfs -text trainSetClassAmount/part-r-00000 
USA:zoran	1
USA:zorn	1
USA:zurlo	1
USA:zvereva	1
USA:zweig	1
$ hadoop jar hadoop.jar StatisticsWordAmount trainSetClassAmount/part-r-00000 trainSetWordAmount

18/11/28 21:45:46 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/11/28 21:45:46 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
……
	File Input Format Counters 
		Bytes Read=1347054
	File Output Format Counters 
		Bytes Written=196
$ hdfs dfs -text trainSetWordAmount/part-r-00000 
AUSTR	41551
FRA	32185
INDIA	37678
JAP	35830
UK	94290
USA	227701
$ hadoop jar hadoop.jar StatisticsUniqueWordAmount trainSetClassAmount/part-r-00000 trainSetUniqueWordAmount

18/11/28 21:45:46 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/11/28 21:45:46 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
……
	File Input Format Counters 
		Bytes Read=1347054
	File Output Format Counters 
		Bytes Written=595950
$ hdfs dfs -text trainSetUniqueWordAmount/part-r-00000 
zosen	1
zuari	1
zulfikar	1
zurich	1
zurlo	1
zvereva	1
zweig	1
$ hadoop jar hadoop.jar CalculateProbability trainSetClassAmount/part-r-00000 trainSetWordAmount/part-r-00000 trainSetUniqueWordAmount/part-r-00000 testSet testResult
18/11/29 10:08:36 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/11/29 10:08:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
……
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=588122
	File Output Format Counters 
		Bytes Written=39039

$ hdfs dfs -text testResult/part-r-00000
……
488890newsML.txt	FRA
488961newsML.txt	FRA
488975newsML.txt	FRA
488980newsML.txt	USA
488999newsML.txt	FRA
489003newsML.txt	USA
489004newsML.txt	INDIA
489013newsML.txt	USA
489019newsML.txt	AUSTR
489022newsML.txt	USA

$ hadoop jar hadoop.jar CalculateEvaluation testSet testResult trainSetWordAmount/part-r-00000 evaluation
18/11/29 10:08:36 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/11/29 10:08:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
……
	Map-Reduce Framework
		Map input records=1378
		Map output records=8268
		Map output bytes=337963
		Map output materialized bytes=354505
		Input split bytes=107
		Combine input records=0
		Combine output records=0
		Reduce input groups=1378
		Reduce shuffle bytes=354505
		Reduce input records=8268
		Reduce output records=1378
		Spilled Records=16536
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=22
		Total committed heap usage (bytes)=953155584
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=618203
	File Output Format Counters 
		Bytes Written=41573
$ hdfs dfs -text evaluation/part-r-00000 
AUSTR:FN	15
AUSTR:FP	8
AUSTR:TN	1214
AUSTR:TP	59
FRA:FN	15
FRA:FP	13
FRA:TN	1207
FRA:TP	61
INDIA:FN	2
INDIA:FP	3
INDIA:TN	1229
INDIA:TP	62
JAP:FN	17
JAP:FP	6
JAP:TN	1168
JAP:TP	105
UK:FN	19
UK:FP	30
UK:TN	1065
UK:TP	182
USA:FN	19
USA:FP	27
USA:TN	510
USA:TP	740

About


Languages

Language:Java 100.0%