As I mentioned in my previous post, one of the most valuable tools I recently discovered for debugging multithreaded applications is model checking; more specifically the Promela language and the SPIN model checker.
Promela isn't a programming language, but a modelling language that you use to describe the algorithm you'd like to model. It's quite a simple language, without many of the features that we're used to in a proper programming language such as functions, for loops, or even proper scoping of variables. At first those limitations bothered me, but what I soon learned is that those limitations are what make Promela feasible..
You see, when you write a Promela model of your algorithm, you should only focus on the high level details of the algorithm, forget about any nasty implementation details. One of the things that you should specify are correctness assertions, or the algorithm's invariants, which if violated would indicate that there's a bug in the algorithm.
After the model is ready, spin converts it into a C program, which you compile and run. Running this program constructs the state space of the system and tries to find a state that violates some of the invariants or assertions. Since the basic type of search performs an exhaustive search of literally every possible state in the system, if it doesn't find an error - then it means that your algorithm is correct! ...... ... That is assuming of course that the model and the assumptions within it are correct as well.
The thing I like the most about it is that when it finds an error, spin is capable of producing the shortest possible path to that error! In my case, while debugging my STM, this path is usually around 100 steps. That's it! So debugging a complex multithreaded algorithm is reduced to just reading these 100 or so steps and figuring out what went wrong! As easy as debugging a single-threaded application!
That said, model checking is not a silver bullet. First of all, you need to learn a new language and a new tool. Learning how to use SPIN and Promela is actually quite easy, but still that's one more thing to do before actually debugging. Also I found that the resources available online and a bit lacking. Fortunately the book The SPIN Model Checker at the library and I think it's an excellent resource.
The other thing to keep in mind is that even if the model has no errors in it, that doesn't mean that the algorithm is correct; the model itself could be wrong, the invariants or assertions specified might not be correct either. Moreover, the problem is that the more complicated the algorithm is, the bigger the state space and more time and memory are required to verify the model. This applies to increasing the number of processes in the system as well. In my case, I was only able to do exhaustive searches for three processes. Adding one more process make it completely infeasible - it requires an estimates 4 terabytes of memory to do an exhaustive search for 4 processes in my algorithm.
You could go back to theory and prove that if the model is correct for 3 processes then it's also correct for N, but until you do that you cannot be sure that it is.
Moreover, Promela only models the algorithm. Implementation issues are hidden. So memory ordering, cache coherence and performance all aren't reflected in the Promela model. You could try to account for some of these in the model, but that would just increase the state space significantly and it's still possible that you forgot something.
I did find that to be a plus though, the fact that the implementation details aren't there means that the algorithm is clearer when written in Promela than in C for example, which makes it easier to share the core algorithm with others. I guess that's why Ananian used Promela code in his PhD thesis.