I've had a pet Python project I've been working on a little bit at a time for a few years now. It grew out of a desire for a more efficient "list" type--one that wouldn't take O(n) time for operations like .insert(). Naturally, I want everything, so I also wanted it to have similar performance to Python "list" when the list only contains a few objects. My project is now in the final stages, where I have a (nearly) complete implementation that looks, acts, and quacks like a regular Python "list", but has better asymptotic performance for many operations (while maintaining similar performance on small lists). Right now I'm calling my data structure the "BList" since it's based on B+Trees. I'm searching for a better name.

I've made the source code for the C extension module available via the following URL:

You will need the Python 2.5 header files and shared library to build it ("apt-get install python2.5-dev"). If you have those installed, just run "make" and cross your fingers. Once the module is built, usage is very simple. For example:

>>> from blist import *
>>> x = BList([1,2,3,4])
>>> x *= 2**24
>>> x.append(5)
>>> y = x[4:-234234]
>>> del x[3:1024]

Yes, line 3 creates a list with 67 million elements, and, yes, it's fast. So are the getslice and del operations.

If you're feeling adventurous, you can even do:
>>> from blist import *
>>> __builtins__.list = BList

.sort() and .reversed() are currently unimplemented. All other methods of Python's list should work. Additionally, there's a .debug() method if you'd like to see what the internal representation of the BList looks like.

I've done moderately extensive performance tests to compare the speed of BList with Python's list. These are located the following URL:

Each .html file in that directory shows the code that was timed and two graphs. The graph on the left shows the absolute time to execute the code as a function of list size, in log-log scale. The graph on the right shows the time to perform the operation on a BList, normalized by the time for a Python list, in a log-linear scale.

The only operation where BList is consistently and noticeably slower (around 20%) is something like x[5]. The Python bytecode interpreter has a special case for that kind of construct on a Python list, and my external module can't compete with code that's special cased deep inside the bytecode interpreter. BList and Python's list have around the same performance for x.__getitem__(5), however.

Feedback greatly desired!


10 comments to BList

  • Any idea why a few of the operations are slower specifically around the n=100 area? (I’m looking at the mul10 and add graphs…) It almost looks like the BList has some sort of periodic size threshold where it slows down every time it gets near it, since a few of the graphs spike up just before n=100 and 10000 but then drop back down again…

    • Each node in the internal tree structure can contain up to 128 elements. When an operation causes the size to exceed 128 elements, there’s some extra cost for allocating more nodes and copying elements around.

      For mul10, a similar cost is incurred any time the operation causes the size to cross a threshold of 128^k (because the tree depth grows by 1).

      The 128 is a compile-time parameter and by adjusting the spike in cost moves around in a corresponding way.

  • Asymptotic complexity of finding a good name?

  • Have you considered doing an amortized flattening of the data structure, so that if someone does enough lookups without any inserts or deletes the time to do a lookup will eventually fall from O(log(n)) to O(1) ?

    • I’ve considered the opposite problem, to address the use case where the list is used as a LIFO stack or used exclusively for lookups.

      Here is my plan: Start with a flat data structure and keep it flat as long as the only operations are lookups, appends, or deleting the last element. Switch to the B-Tree style structure on the first operation that would have cost O(n). The cost of initially building a list of size n is O(n), so we can amortize the one-time conversion cost as part of the list construction.

      Are there common use cases where a list is modified frequently, then becomes fixed and used only for lookups?

      The most recent work I did was to try to integrate the BList into the Python interpreter (replacing the regular list), but that effort has stalled. I ran up against some problems that (apparently) require a deep knowledge of the Python interpreter to debug and I didn’t have the time to pursue it.

      • I think there’s one in your example. If you say x = [1, 2, 3, 4] * (2 ** 24) then it will have O(log(n)) lookup time. If you then proceed to use the resulting list with lots of lookups and modifies but no inserts and deletes, then if you do enough manipulation it will be much slower than if it had simply made a list up front. That problem could be fixed with amortized flattening. The flattening would make the data structure non-sparse in this example though. Whether this issue is worth the extra work is unclear, but that which is worth doing is worth doing to excess.

        You should also probably modify insort so that it uses the b-tree structure to still run in log(n) instead of log(n) ** 2, which it will do if it has to treat the underlying blist as a black box.

        • You should also probably modify insort so that it uses the b-tree structure to still run in log(n) instead of log(n) ** 2, which it will do if it has to treat the underlying blist as a black box.

          It makes use of the BList structure, so it’s worst-case O(log(n)) now (see left figure). The regular list does some extra tricks to get O(n) performance on mostly-sorted data, which I have not attempted to reproduce yet.

          • I wasn’t talking about sort(), I was talking about the insort() function in the bisect module (and the other functions in the bisect module).

          • Oh! Oops, didn’t catch that. Yes, insort and family are currently O((log n)**2) Do you see a clever way to make it O(log n) by using the structure?

            Even if it keeps track of where it is in the tree (rather than going back to the root every time), I think it still works out to O((log n)**2), albeit with a smaller constant factor.

            There’s no value information inside the tree, so it has to go all the way down to a leaf to make each comparison.

          • Hrm, you’re right. Worst case appears to be log(n) ** 2, with the whitebox speedups only getting a constant factor improvement.

            You could start putting value information inside the tree. It would result in weird results when the tree isn’t sorted, but insort() doesn’t make any particular guarantees on unsorted lists.

            The use case of inserting and removing from a sorted list is certainly one worth optimizing for, since it involves a lot of inserts and deletes at random locations, and is relatively common.

Leave a Reply