I've had a pet Python project I've been working on a little bit at a time for a few years now. It grew out of a desire for a more efficient "list" type--one that wouldn't take O(n) time for operations like .insert(). Naturally, I want everything, so I also wanted it to have similar performance to Python "list" when the list only contains a few objects. My project is now in the final stages, where I have a (nearly) complete implementation that looks, acts, and quacks like a regular Python "list", but has better asymptotic performance for many operations (while maintaining similar performance on small lists). Right now I'm calling my data structure the "BList" since it's based on B+Trees. I'm searching for a better name.

I've made the source code for the C extension module available via the following URL:

You will need the Python 2.5 header files and shared library to build it ("apt-get install python2.5-dev"). If you have those installed, just run "make" and cross your fingers. Once the module is built, usage is very simple. For example:

>>> from blist import *
>>> x = BList([1,2,3,4])
>>> x *= 2**24
>>> x.append(5)
>>> y = x[4:-234234]
>>> del x[3:1024]

Yes, line 3 creates a list with 67 million elements, and, yes, it's fast. So are the getslice and del operations.

If you're feeling adventurous, you can even do:
>>> from blist import *
>>> __builtins__.list = BList

.sort() and .reversed() are currently unimplemented. All other methods of Python's list should work. Additionally, there's a .debug() method if you'd like to see what the internal representation of the BList looks like.

I've done moderately extensive performance tests to compare the speed of BList with Python's list. These are located the following URL:

Each .html file in that directory shows the code that was timed and two graphs. The graph on the left shows the absolute time to execute the code as a function of list size, in log-log scale. The graph on the right shows the time to perform the operation on a BList, normalized by the time for a Python list, in a log-linear scale.

The only operation where BList is consistently and noticeably slower (around 20%) is something like x[5]. The Python bytecode interpreter has a special case for that kind of construct on a Python list, and my external module can't compete with code that's special cased deep inside the bytecode interpreter. BList and Python's list have around the same performance for x.__getitem__(5), however.

Feedback greatly desired!


10 comments to BList