How to debug deadlocks or fatal errors
Few weeks back I was working on parallel execution in gem called Dynflow and I ran into a deadlock. As you may know deadlock is an exception of class fatal
. This special exception is not rescuable in fact none of the rescue blocks is evaluated when fatal
is risen. This makes its debugging pretty hard.
Lets have a simple example generating deadlock.
require 'thread'
# cannot join on current main thread, it would wait forever
Thread.current.join
produces
./fatal.rb:3:in `join': deadlock detected (fatal)
from ./fatal.rb:3:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'
Unfortunately event though rescue blocks are not evaluated ensure blocks are. I am using RubyMine and for some reason standard debugger breakpoint does not work in ensure block on line 6 at following example.
require 'thread'
begin
Thread.current.join
ensure
p $! if $!
end
Produces following output without stopping on line 6.
#<fatal: deadlock detected>
./fatal.rb:4:in `join': deadlock detected (fatal)
from ./fatal.rb:4:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'
I could not google any solution but there is a nice trick. Pry can be used in ensure block.
require 'thread'
begin
Thread.current.join
ensure
binding.pry if $!
end
It will start pry session right after deadlock was risen giving an opportunity to inspect still running Ruby process to find out what is wrong. It's also very useful to combine pry
with gem called pry-stack_explorer to able to inspect current stack like in debugger.
In the end it gave me enough information to find the problem. Hopefully it will save you some time if you run into similar issue.
Note: This examples are in Ruby 1.9.3. In Ruby 2.0.0 Thread.current.join
raises nice ThreadError
which is subclass of StandardError
which can be debugged/inspected using usual means. Nevertheless similar deadlock like problem can rise in different situations in Ruby 2.0.0 too.