Building a faster-like type for Python, part 2

I needed a break from my deep mental immersion in poker software, so over the past few days I spent some time working on my faster list-like type for Python, the "blist". Some of you with good memories may remember it from my last post about it... over a year ago. Previously, it featured O(log n) inserts and deletes without giving up any performance for small lists when compared to Python's current array-based lists. However, random access reads and writes with no change in length were also O(log n), compared to an array's O(1).

With the new modifications, blist's read and write operations are now O(1) also. :-) (with a minor caveat below)

The blist is based loosely on B+Tree's, meaning it has a fat and wide tree structure. Each node in the tree always contains between 64 and 128 children. To make O(1) get and set operations, the root node now has an extra array with n/log n entries that acts as a cache of pointers to leaf nodes (each leaf BList node contains pointers to 64 to 128 user objects). Additionally, the root node maintains a binary tree data structure describing which entries of the cache are valid. When all the entries in the cache are valid, get and set operations can skip the tree structure and do their work in O(1) time. When the cache cannot be used, the operation takes O(log n) time. Any get/set operation updates at least one cache entry. An insert or delete operation invalidates all or part of the cache, and takes O(log n) time.

Since there are O(n / log n) entries and updating an entry requires O(log n) work, updating all the entries requires O(n) operations.

A sequence of k>0 insert/delete operations followed by m get/set operations therefore takes the following time: O(k*log n + min(n, m*log n) + m).

If the k*log n term dominates, then we have O(log n) per insert/delete, which is good.

If the m term dominates, then we have O(1) per get/set, which is good.

The min(n, m*log n) term dominates only if k < m < n. In that case, we have O(log n) per get/set, which is not ideal. However, this is still better than an array implementation which must pay O(n) for the insert/delete, and no worse than my earlier BList revision. I am also cheating a bit in my implementation by making the cache contain n/64 entries instead of n/log n, under the rather safe assumption that log_2 n <= 64. With this change, my BList is type is asymptotically more efficient than an array-based list for two more operations. Awesome. All that's left now is appending to and deleting from the end of the list, which is O(1) for an array but still O(log n) for the BList. After that, I can focus on improving constant factors and special cases (like sorting an already-sorted list). I also completed work on a patch to Python 2.5 to completely replace the internal list implementation with the BList. It passes all the Python unit tests, aside from behaving differently for some overly-specific tests that are really just intended to make sure it doesn't crash (evil things like modifying lists during a sort or iteration).


Leave a Reply