pranab / sifarish

Content based and collaborative filtering based recommendation and personalization engine implementation on Hadoop and Storm

Home Page:http://pkghosh.wordpress.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

brec.sh genHistEvent error

nattachai305 opened this issue · comments

Hi Pranab, I followed the tutorial through Implicit scenario and I stuck at the step of genHistEvent.
As this mentioned ./brec.sh genHistEvent <item_count> <user_count> <average_event_count_per_user>.
I run the following below.
./brec.sh genHistEvent 100 100 9
And got error
./brec.sh: line 58: $5: ambiguous redirect
The schema I use is exacly engageEvent.json. and the variables I used in brec.sh are below.
JAR_NAME=/etc/recomlib/sifarish-1.0.jar CHOMBO_JAR_NAME=/etc/recomlib/chombo-1.0.jar HDFS_BASE_DIR=/user/pranab/reco PROP_FILE=/etc/git/sifarish/reco.properties HDFS_META_BASE_DIR=/user/pranab/meta/imra
Also I have already created JAR_NAME, CHOMBO_JAR_NAME, PROP_FILE and HDFS_BASE_DIR, HDFS_META_BASE_DIR in local filesystem and HDFS accordingly.
I have downloaded all the required dependencies.
I've been trying to solve this for too long time and I can not. So I couldn't help but asked for your help here and would appreciate your answer.

Regarding uuid issue which version of ruby are you using?

I think it's ruby 1.8.7 (2013-06-27 patchlevel 374)

I have tried 2.4.1 but doesn't work either. Which version should I use or is required for the project?

Thanks for your time.
Aside from this problem I got another problem when I was going through Explicit Rating Data Generation approach until stuck at step 5. The error showed below.

`[root@quickstart resource]# sudo -u hdfs ./brec.sh correlation
running MR to generate item correlation from rating data
input /user/pranab/reco/crat output /user/pranab/reco/simi
rmr: DEPRECATED: Please use 'rm -r' instead.
rmr: `/user/pranab/reco/simi': No such file or directory
removed output dir /user/pranab/reco/simi
18/05/09 06:37:39 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
18/05/09 06:37:40 INFO input.FileInputFormat: Total input paths to process : 1
18/05/09 06:37:40 INFO mapreduce.JobSubmitter: number of splits:1
18/05/09 06:37:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1525063506679_0062
18/05/09 06:37:40 INFO impl.YarnClientImpl: Submitted application application_1525063506679_0062
18/05/09 06:37:40 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1525063506679_0062/
18/05/09 06:37:40 INFO mapreduce.Job: Running job: job_1525063506679_0062
18/05/09 06:37:47 INFO mapreduce.Job: Job job_1525063506679_0062 running in uber mode : false
18/05/09 06:37:47 INFO mapreduce.Job:  map 0% reduce 0%
18/05/09 06:37:55 INFO mapreduce.Job:  map 100% reduce 0%
18/05/09 06:38:01 INFO mapreduce.Job: Task Id : attempt_1525063506679_0062_r_000000_0, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "**6Z31HNOXVGHM**"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:492)
	at java.lang.Integer.parseInt(Integer.java:527)
	at org.sifarish.feature.CosineSimilarity.initVector(CosineSimilarity.java:83)
	at org.sifarish.feature.CosineSimilarity.findDistance(CosineSimilarity.java:45)
	at org.sifarish.common.ItemDynamicAttributeSimilarity$SimilarityReducer.reduce(ItemDynamicAttributeSimilarity.java:282)
	at org.sifarish.common.ItemDynamicAttributeSimilarity$SimilarityReducer.reduce(ItemDynamicAttributeSimilarity.java:164)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

18/05/09 06:38:08 INFO mapreduce.Job:  map 100% reduce 100%
18/05/09 06:38:08 INFO mapreduce.Job: Job job_1525063506679_0062 failed with state FAILED due to: Task failed task_1525063506679_0062_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1

18/05/09 06:38:08 INFO mapreduce.Job: Counters: 37
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=173166
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=4431
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=3
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=0
	Job Counters 
		Failed reduce tasks=2
		Launched map tasks=1
		Launched reduce tasks=2
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=2305536
		Total time spent by all reduces in occupied slots (ms)=3828224
		Total time spent by all map tasks (ms)=4503
		Total time spent by all reduce tasks (ms)=7477
		Total vcore-milliseconds taken by all map tasks=4503
		Total vcore-milliseconds taken by all reduce tasks=7477
		Total megabyte-milliseconds taken by all map tasks=2305536
		Total megabyte-milliseconds taken by all reduce tasks=3828224
	Map-Reduce Framework
		Map input records=100
		Map output records=1000
		Map output bytes=65000
		Map output materialized bytes=13391
		Input split bytes=131
		Combine input records=0
		Spilled Records=1000
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=169
		CPU time spent (ms)=1570
		Physical memory (bytes) snapshot=227512320
		Virtual memory (bytes) snapshot=980533248
		Total committed heap usage (bytes)=257425408
	File Input Format Counters 
		Bytes Read=4300
rmr: DEPRECATED: Please use 'rm -r' instead.
rmr: `/user/pranab/reco/simi/_logs': No such file or directory
rmr: DEPRECATED: Please use 'rm -r' instead.
rmr: `/user/pranab/reco/simi/_SUCCESS': No such file or directory`

According to the tutorial I thought that the format of rate is already correct so I tried to skip step 3 and copied rate from /reco to /reco/crat and successfully ran ./brec.sh correlation at step 5. But still stuck at step 6.3 and the error is shown below.

running MR for rating predictor
input /user/pranab/reco/crat,/user/pranab/reco/simi output /user/pranab/reco/utpr
rmr: DEPRECATED: Please use 'rm -r' instead.
rmr: `/user/pranab/reco/utpr': No such file or directory
removed output dir /user/pranab/reco/utpr
18/05/09 06:53:49 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
18/05/09 06:53:50 INFO input.FileInputFormat: Total input paths to process : 2
18/05/09 06:53:50 INFO mapreduce.JobSubmitter: number of splits:2
18/05/09 06:53:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1525063506679_0065
18/05/09 06:53:50 INFO impl.YarnClientImpl: Submitted application application_1525063506679_0065
18/05/09 06:53:50 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1525063506679_0065/
18/05/09 06:53:50 INFO mapreduce.Job: Running job: job_1525063506679_0065
18/05/09 06:53:58 INFO mapreduce.Job: Job job_1525063506679_0065 running in uber mode : false
18/05/09 06:53:58 INFO mapreduce.Job:  map 0% reduce 0%
18/05/09 06:54:04 INFO mapreduce.Job: Task Id : attempt_1525063506679_0065_m_000000_0, Status : FAILED
**Error: java.lang.NumberFormatException: For input string: "MBFM6Q0Q1PR9:84"**
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:492)
	at java.lang.Integer.parseInt(Integer.java:527)
	at org.sifarish.common.UtilityPredictor$PredictionMapper.map(UtilityPredictor.java:201)
	at org.sifarish.common.UtilityPredictor$PredictionMapper.map(UtilityPredictor.java:90)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

18/05/09 06:54:05 INFO mapreduce.Job:  map 50% reduce 0%
18/05/09 06:54:10 INFO mapreduce.Job:  map 100% reduce 0%
18/05/09 06:54:11 INFO mapreduce.Job:  map 100% reduce 100%
18/05/09 06:54:11 INFO mapreduce.Job: Job job_1525063506679_0065 failed with state FAILED due to: Task failed task_1525063506679_0065_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

18/05/09 06:54:11 INFO mapreduce.Job: Counters: 35
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=178789
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=21319
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=3
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=0
	Job Counters 
		Failed map tasks=2
		Killed reduce tasks=1
		Launched map tasks=3
		Other local map tasks=1
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=7176192
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=14016
		Total vcore-milliseconds taken by all map tasks=14016
		Total megabyte-milliseconds taken by all map tasks=7176192
	Map-Reduce Framework
		Map input records=757
		Map output records=1514
		Map output bytes=71158
		Map output materialized bytes=19041
		Input split bytes=131
		Combine input records=0
		Spilled Records=1514
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=112
		CPU time spent (ms)=1320
		Physical memory (bytes) snapshot=232357888
		Virtual memory (bytes) snapshot=981663744
		Total committed heap usage (bytes)=257425408
	File Input Format Counters 
		Bytes Read=21188
rmr: DEPRECATED: Please use 'rm -r' instead.
rmr: `/user/pranab/reco/utpr/_logs': No such file or directory
rmr: DEPRECATED: Please use 'rm -r' instead.
rmr: `/user/pranab/reco/utpr/_SUCCESS': No such file or directory

I have tried to solve this but I cannot. So would you please enlighten me.

Please provide me with the sequence of steps using step numbers from the tutorial. Your input to the the MR seems incorrect. Also provide 2 or 3 sample lines form the input to the the failing MR job. The fact that you manually copied files, tells me that you are not following the tutorial steps properly.