Byte-Lab / JCoz

JCoz -- A Java causal profiler

Home Page:http://decave.github.io/JCoz/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

JVM hangs at safepoint synchronization when profiling some large applications

AlexVanGogen opened this issue · comments

commented

Sometimes I get the following thread states:

  • profiler thread blocks at safepoint when executing jvmti->GetLineNumberTable and holding frame_lock;
  • one user thread in signal handling holds in_scope_lock and spins on acquiring frame_lock;
  • some other user threads in signal thread spin on acquiring in_scope_lock.

According to gdb, user threads were interrupted while being already blocked at safepoint, and these threads are unable to block once more. Running application with -XX:+SafepointTimeout and other related flags agrees with gdb and reports that the same user threads which spin in signal handler cannot reach safepoint.

Tightening critical section in profiler code so that it enters section only to work with static_call_frames fixes this problem, but not entirely I guess -- this just harshly reduces the chance of its occurrence.

It's odd that we're stuck at jvmti->GetLineNumberTable, but I'm pretty sure (as you said on gitter), we don't need to be holding frame_lock past the for loop at https://github.com/Decave/JCoz/blob/master/src/native/profiler.cc#L385. frame_lock really just needs to protect static_call_frames, which is used to collect call frames in the user threads when an experiment isn't running.