brushtechnology / fabricate

The better build tool. Finds dependencies automatically for any language.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

StraceRunner crashes with KeyError

bmatsuo opened this issue · comments

Summary

The Go compiler toolchain go build occasionally causes the StraceRunner to crash. This appears to be caused by unexpected lines appearing in the strace output log (potential strace bug that does not appear to be addressed yet upstream).

I've included reproduction instructions and my own analysis here.

Reproduction

I am running Ubuntu 14.04 LTS (trusty) 64-bit and the version of strace available through apt-get is version 4.8 (latest is 4.10).

Here is the build script I am using.

#!/usr/bin/env python

from fabricate import run, main

def build():
    run('go', 'build', './cmd/fabtest')

main()

It produces tracebacks like this:

Traceback (most recent call last):
  File "./build", line 8, in <module>
    main()
  File "/home/bmatsuo/local/fabricate/fabricate.py", line 1578, in main
    this_status = eval(action, globals_dict)
  File "<string>", line 1, in <module>
  File "./build", line 6, in build
    run('go', 'build', './cmd/fabtest')
  File "/home/bmatsuo/local/fabricate/fabricate.py", line 1400, in run
    return default_builder.run(*args, **kwargs)
  File "/home/bmatsuo/local/fabricate/fabricate.py", line 1142, in run
    return self._run(*args, **kwargs)
  File "/home/bmatsuo/local/fabricate/fabricate.py", line 1125, in _run
    deps, outputs = self.runner(*arglist, **kwargs)
  File "/home/bmatsuo/local/fabricate/fabricate.py", line 769, in __call__
    return self._runner(*args, **kwargs)
  File "/home/bmatsuo/local/fabricate/fabricate.py", line 724, in __call__
    status, deps, outputs = self._do_strace(args, kwargs, outfile, outname)
  File "/home/bmatsuo/local/fabricate/fabricate.py", line 571, in _do_strace
    self._match_line(line, processes, unfinished)
  File "/home/bmatsuo/local/fabricate/fabricate.py", line 594, in _match_line
    line = unfinished[pid] + body
KeyError: '31317'

The contents of the ./cmd/fabtest directory passed to go build do not appear to matter; compiling any Go program will crash the build script if it is run enough times (making sure to autoclean inbetween compilations). In this case it is a single file, main.go containing the most trivial of programs.

package main

import "fmt"

func main() {
    fmt.Println("hello, world")
}

Analysis

Looking at the strace output obtained with -k you can see that there is an inconsistency in the output related to the PID seen in the traceback (31317)

$ egrep 31317 strace000.txt
31316 clone(child_stack=0x7fea43379fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fea4337a9d0, tls=0x7fea4337a700, child_tidptr=0x7fea4337a9d0) = 31317
31317 clone(child_stack=0x7fea42b78fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fea42b799d0, tls=0x7fea42b79700, child_tidptr=0x7fea42b799d0) = 31318
31317 <... futex resumed> )             = ? <unavailable>
31317 +++ exited with 0 +++

The second to last line reports the completion of a futex call when the entry point of that call had not been observed. All calls to futex should be hidden because of the -e trace=... option passed to strace. So this seems to indicates a bug in strace.

NOTE: I have seen the same kind of strace logging from other system calls, specifically select. This problem does not apply only to the futex call.

I know of this bug (in strace) and have a work around in fabricate for it.
I will add the changes to the GitHub repository.

Simon.

On 28 June 2015 at 01:56, Bryan Matsuo notifications@github.com wrote:

Summary

The Go compiler toolchain go build occasionally causes the StraceRunner
to panic. This appears to be caused by unexpected lines appearing in the
strace output log (potential strace bug that does not appear to be
addressed yet upstream).

I've included reproduction instructions and my own analysis here.
Reproduction

I am running Ubuntu 14.04 LTS (trusty) 64-bit and the version of strace
available through apt-get is version 4.8 (latest is 4.10).

Here is the build script I am using.

#!/usr/bin/env python

from fabricate import run, main

def build():
run('go', 'build', './cmd/fabtest')

main()

It produces tracebacks like this:

Traceback (most recent call last):
File "./build", line 8, in
main()
File "/home/bmatsuo/local/fabricate/fabricate.py", line 1578, in main
this_status = eval(action, globals_dict)
File "", line 1, in
File "./build", line 6, in build
run('go', 'build', './cmd/fabtest')
File "/home/bmatsuo/local/fabricate/fabricate.py", line 1400, in run
return default_builder.run(_args, *_kwargs)
File "/home/bmatsuo/local/fabricate/fabricate.py", line 1142, in run
return self._run(_args, *_kwargs)
File "/home/bmatsuo/local/fabricate/fabricate.py", line 1125, in _run
deps, outputs = self.runner(_arglist, *_kwargs)
File "/home/bmatsuo/local/fabricate/fabricate.py", line 769, in call
return self._runner(_args, *_kwargs)
File "/home/bmatsuo/local/fabricate/fabricate.py", line 724, in call
status, deps, outputs = self._do_strace(args, kwargs, outfile, outname)
File "/home/bmatsuo/local/fabricate/fabricate.py", line 571, in _do_strace
self._match_line(line, processes, unfinished)
File "/home/bmatsuo/local/fabricate/fabricate.py", line 594, in _match_line
line = unfinished[pid] + body
KeyError: '31317'

The contents of the ./cmd/fabtest directory passed to go build do not
appear to matter; compiling any Go program will crash the build script if
it is run enough times (making sure to autoclean inbetween compilations).
In this case it is a single file, main.go containing the most trivial of
programs.

package main

import "fmt"

func main() {
fmt.Println("hello, world")
}

Analysis

Looking at the strace output obtained with -k you can see that there is
an inconsistency in the output related to the PID seen in the traceback
(31317)

$ egrep 31317 strace000.txt
31316 clone(child_stack=0x7fea43379fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fea4337a9d0, tls=0x7fea4337a700, child_tidptr=0x7fea4337a9d0) = 31317
31317 clone(child_stack=0x7fea42b78fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fea42b799d0, tls=0x7fea42b79700, child_tidptr=0x7fea42b799d0) = 31318
31317 <... futex resumed> ) = ?
31317 +++ exited with 0 +++

The second to last line reports the completion of a futex call when the
entry point of that call had not been observed. All calls to futex should
be hidden because of the -e trace=... option passed to strace. So this
seems to indicates a bug in strace.

NOTE: I have seen the same kind of strace logging from other system
calls, specifically select in the strace logs. This problem does not
apply only to the futex call.


Reply to this email directly or view it on GitHub
#62.

Fixed on master.

Warning is now printed instead of crash.

Also slightly altered the exception handling to provide better error reporting for problems like this.

I did not initially submit this fix because I wrongly assumed it was specific to the ancient version of strace I was using on my system.

Thanks @SimonAlfie. I will check out the latest version.

Just to follow up on the strace side the mailing list was very responsive. There is a fix for these types of spurious "resumed" lines in its master branch. Compiling strace from source should prevent fabricate from emitting warnings (or crashing earlier versions) when running go build.

@SimonAlfie It seems like this patch was never released to pypi. The version on pypi is still 1.26. What are your thoughts about releasing 1.27?

Also FWIW, strace-4.11 has just been released. I don't know how long it will take the major distros to pick it up. But it should fix the particular bug I was hitting here. It may still be worth having the patch from your commit, 2bd38a5 for more resiliency against bugs found in the future 😕