SwiftShader hangs on fence_timeout
tyler-utah opened this issue · comments
Hi,
I've been running suites of amber tests on various devices. We recently started running on SwiftShader and we found some behavior that might not be intended.
We're running tests with various spinning and blocking behaviors, so the SET ENGINE_DATA fence_timeout_ms
feature is very important to us.
We've found some cases where the test timeouts, but control is not returned to the shell. Thus, when I run large suites of tests through bash, it hangs on these tests rather than simply recording the timeout and moving on.
I've attached an amber test, the vulkaninfo , and cpu info (from /proc/cpuinfo)
I can work around this using a "timeout" command, which kills the process, but this feels redundant to use with the fence_timeout feature.
Hi @tyler-utah,
For the tests that are timing out, does the shader logic allow the invocations to complete, or are they permanently stuck in a loop or blocking state? Your test.txt
shader is complex enough for me to be unsure whether it can finish.
SwiftShader is currently lacking any way to raise VK_ERROR_DEVICE_LOST
events for shader invocations that become permanently blocked, but this is something that will be needed relatively soon.
I'll try and reproduce your sample test.txt
next week.
Cheers,
Ben
Hi @ben-clayton,
Thanks for the quick reply!
We're actually unsure if the shader logic allows the invocations to complete; and in fact the purpose of the research project is to probe different devices for conditions under which they can hang. So far, we've been able to rely on fence_timout to cleanly recover from hanging tests, but it sounds like this isn't guaranteed.
The shader code is a little nasty because its auto-generated from a specification that uses goto
statements, which aren't supported in glsl.
It sounds like our best bet is to fall back to the timeout
command as an extra level of defense against this (at least for now).
Thanks again!
Tyler
Hi Tyler,
For SwiftShader-specific bugs or feature requests, feel free to report them at https://g.co/swiftshaderbugs
At https://gitlab.khronos.org/vulkan/vulkan/issues/2030 the Khronos members discussed whether the Vulkan spec requires implementations to time out when running shaders that don't finish. The conclusion was that this is not a core requirement, but it could be part of an extension. The equivalent extension for OpenGL is under consideration at KHR_robust_gpu_timeout. Like any timing related feature, testing it is a challenge.
SwiftShader could potentially help with that, but we have several other priorities at the moment, and want to make sure this fits well with our future direction. I'm not entirely convinced that we handle Vulkan's existing timeout requirements yet, so that also needs to be looked into first. If you notice something specific that's supposed to be supported, please let us know!
Cheers,
Nicolas - SwiftShader lead