Thursday, July 10, 2008

A Scalable Binary Search Tree Based Structure in Erlang

In my previous post I talked about writing a scalable binary search tree, and the difficulties I was having.

I believe I finally managed to create the structure I wanted to. The main problem that I was having is that I was still thinking in C (or sequential languages), so what I wanted to do was build a single binary search tree (bst) that scales well. What I realized though, is that rather than doing that, I should create N bsts where N is proportional to the number of processors available in the system.

So what I'm doing now is spawn a number of threads (servers), each thread is responsible for a bst that covers a range of keys, and whenever an operation is performed, the client sends it to the particular server that's responsible for the tree in that range. If this doesn't make sense, then maybe my code will make it clearer:-

http://www.cs.auckland.ac.nz/~fuad/dbst.erl (code needs better error handling, general tidying up)


The main problem with this implementation though is that if the distribution of the keys isn't uniform within the expected range, it won't perform well. But I guess that is a problem with non-balanced bsts in general.

That said, this solution turned out to be simpler than other solutions I've attempted, performs better in the single-threaded case, and scales pretty well.

Here's a link to a graph that shows how well this scales on an 8 core machine:-


Compare with my previous attempt:-


Can't really ask for better than that.

What I need to do now is figure out a way to balance the load somehow when it's not uniform.

2 comments:

Elhoim said...

What about hashing the keys?

Fuad said...

I think I see what you're saying here. Rather than have each bucket responsible for a range of keys, have the structure be a hash table each bucket in the table consisting of a binary tree?

I think that's definitely a good idea, it's simple and it would cover many of the deficiencies in this implementation.

I should point out that the goal eventually isn't just to store keys, but values associated with those keys as well (i.e. a dictionary). Just did the key part here as a proof of concept.

Thanks for the hashing idea! :)