Yelp / mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services

Home Page:http://packages.python.org/mrjob/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TypeError when writing to stderr within a job on Python 3

riazjahangir opened this issue · comments

We're running into an issue with str vs bytes on Python 3 related to commit 0f0297b which changes sys.stderr from a TextIOWrapper in 'w' mode to a BufferedWriter in 'wb' mode.

Example

The error occurs with any attempt to write a str to sys.stdout or sys.stderr, e.g. via print. In our particular case, some of the libraries we depend on are printing warnings using the builtin warnings module, which internally writes to sys.stderr by default. See example below, tested inline on Python 3.7.4 and 3.8.0 with mrjob 0.7.1.

import warnings
from mrjob.job import MRJob


class MRWordFrequencyCount(MRJob):

    def mapper(self, _, line):
        warnings.warn('Here is a warning')
        yield "chars", len(line)
        yield "words", len(line.split())
        yield "lines", 1

    def reducer(self, key, values):
        yield key, sum(values)


if __name__ == '__main__':
    MRWordFrequencyCount.run()

Traceback

No configs found; falling back on auto-configuration
No configs specified for inline runner
Creating temp directory /tmp/mr_word_count.ec2-user.20200327.160028.510282
Running step 1 of 1...

Error while reading from /tmp/mr_word_count.ec2-user.20200327.160028.510282/step/000/mapper/00000/input:

Traceback (most recent call last):
  File "mr_word_count.py", line 18, in <module>
    MRWordFrequencyCount.run()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/job.py", line 616, in run
    cls().execute()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/job.py", line 687, in execute
    self.run_job()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/job.py", line 636, in run_job
    runner.run()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/runner.py", line 497, in run
    self._run()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/sim.py", line 160, in _run
    self._run_step(step, step_num)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/sim.py", line 169, in _run_step
    self._run_streaming_step(step, step_num)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/sim.py", line 180, in _run_streaming_step
    self._run_mappers_and_combiners(step_num, map_splits)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/sim.py", line 221, in _run_mappers_and_combiners
    for task_num, map_split in enumerate(map_splits)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/sim.py", line 129, in _run_multiple
    func()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/sim.py", line 723, in _run_mapper_and_combiner
    run_mapper()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/sim.py", line 746, in _run_task
    stdin, stdout, stderr, wd, env)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/inline.py", line 132, in invoke_task
    task.execute()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/job.py", line 675, in execute
    self.run_mapper(self.options.step_num)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/job.py", line 760, in run_mapper
    for k, v in self.map_pairs(read_lines(), step_num=step_num):
  File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/job.py", line 830, in map_pairs
    for k, v in mapper(key, value) or ():
  File "mr_word_count.py", line 8, in mapper
    warnings.warn('Here is a warning')
  File "/usr/lib64/python3.7/warnings.py", line 112, in _showwarnmsg
    _showwarnmsg_impl(msg)
  File "/usr/lib64/python3.7/warnings.py", line 30, in _showwarnmsg_impl
    file.write(text)
TypeError: a bytes-like object is required, not 'str'

Appreciate any guidance you can give us on options to work around this. Thanks!

Believe we've found a workaround for the warnings case in particular. This will redirect output from the warnings module to the logging system:

logging.captureWarnings(True)

https://docs.python.org/3/library/logging.html#logging.captureWarnings