Job fails when running local job
mitcheccles opened this issue · comments
The example jobs appear to fail when running them locally.
I have cloned down this repo, and fetched the data using the get-data script. I attempt to run the tag_counter example as per the readme. The exact command I run is: python tag_counter.py --conf-path mrjob.conf --no-output --output-dir out crawl-data/CC-MAIN-2014-35/segments/1408500800168.29/warc/
The response I get is: IOError: [Errno 2] No such file or directory: '/<path>/cc-mrjob/WARC/1.0'
I get a different response, if I run the above command with the -r local
argument. The job appears to start executing, and says "Running step 1 of 1...". However, it just hangs indefinitely, until I kill python.
I've tried the examples on a couple of machines and keep getting the same result. I suspect, I've missed some all important step? Or maybe, there is a bug?
I'm on python2.7, using mrjob-0.5.8.
That's because the job tries to read the WARC file itself as a list of WARC files to process. Try as stated in the README: python tag_counter.py --conf-path mrjob.conf --no-output --output-dir out input/test-1.warc
But the description could explicitly state this. I'll update it. If you find more points which need a clarification, please, do report it or open a pull-request. Thanks!
Ah nuts! Yes, that works now. Thank you. I don't know why, but I thought input/test-1.warc
was just a placeholder string, and didn't twig that it was a folder in the repo... Doh!
Thanks for your help and for putting this tutorial together :).