How to debug deadlocks or fatal errors

Few weeks back I was working on parallel execution in gem called Dynflow and I ran into a deadlock. As you may know deadlock is an exception of class fatal. This special exception is not rescuable in fact none of the rescue blocks is evaluated when fatal is risen. This makes its debugging pretty hard.

Lets have a simple example generating deadlock.

require 'thread'
# cannot join on current main thread, it would wait forever
Thread.current.join

produces

./fatal.rb:3:in `join': deadlock detected (fatal)
    from ./fatal.rb:3:in `<top (required)>'
    from -e:1:in `load'
    from -e:1:in `<main>'

Unfortunately event though rescue blocks are not evaluated ensure blocks are. I am using RubyMine and for some reason standard debugger breakpoint does not work in ensure block on line 6 at following example.

require 'thread'

begin
  Thread.current.join
ensure
  p $! if $!
end

Produces following output without stopping on line 6.

#<fatal: deadlock detected>
./fatal.rb:4:in `join': deadlock detected (fatal)
    from ./fatal.rb:4:in `<top (required)>'
    from -e:1:in `load'
    from -e:1:in `<main>'

I could not google any solution but there is a nice trick. Pry can be used in ensure block.

require 'thread'

begin
  Thread.current.join
ensure
  binding.pry if $!
end

It will start pry session right after deadlock was risen giving an opportunity to inspect still running Ruby process to find out what is wrong. It's also very useful to combine pry with gem called pry-stack_explorer to able to inspect current stack like in debugger.

In the end it gave me enough information to find the problem. Hopefully it will save you some time if you run into similar issue.

Note: This examples are in Ruby 1.9.3. In Ruby 2.0.0 Thread.current.join raises nice ThreadError which is subclass of StandardError which can be debugged/inspected using usual means. Nevertheless similar deadlock like problem can rise in different situations in Ruby 2.0.0 too.