Skip to content Skip to sidebar Skip to footer

Cython Implementation No Faster Than Pure Python

For an exercise I've written a XOR doubly-linked list %%cython from cpython.object cimport PyObject from cpython.ref cimport Py_XINCREF, Py_XDECREF from libc.stdint cimport uintpt

Solution 1:

The three culprits from your profiling look to be Node's __init__ (which is unavoidable here), and __get__ and __set__ for the prev_xor_next property. My view is that you don't want the prev_xor_next property (or if you do it should be read-only) since it makes what should be a Cython internal accessible in Python.

Whether you delete the property or not, you are working in Cython here so you can write directly to the underlying C attribute _prev_xor_next. You may need to set cdef Node cur_last at the start of append (and maybe in other functions) to ensure that Cython knows the type of cur_last - I think it should be able to work it out but if you get AttributeErrors at runtime then this is what you need to do.

This change gives me a 30% speed increase (i.e. it's still slower than a regular list, but it's a noticeable improvement).


I'll outline a more drastic change that I possibly should have suggested on your first question about this problem. This really is a vague outline so no effort has been made to get it to work...

  • Node is entirely internal to your XORList class: it should not be used from Python and the lifetime of all the Nodes in an XORList is tied directly to the list. Therefore they should be destructed on the destruction of their owning XORList (or if the list shrinks, etc) and so do not need to be reference counted. Therefore Node should be a C struct rather than a Python object:

    cdef structNode:
        uintptr_t prev_xor_next
        PyObject* val
    
    # with associated constructor- and destructor-like functions:cdef Node* make_node(object val, uintptr_t prev_xor_next):
        cdef Node* n = <Node*>malloc(sizeof(Node))
        n.val = <PyObject*>val
        Py_XINCREF(n.val)
        n.prev_xor_next = prev_xor_next
        return n
    
    cdef voiddestroy_node(Node* n):
        Py_XDECREF(n.val)
        free(n)
    
  • XORList needs a __dealloc__ function that loops through the list calling destroy_node on each Node (it needs a __dealloc__ function anyway in your version too!)

  • CurrentNode needs to remain a Cython class, since this is your "iterator" interface. It can obviously no longer inherit from Node. I'd change it to:

    cdefclass XORListIterator:
        cdefNode* current_node
        cdefXORList our_list
    

    the point of the attribute our_list is to ensure that the XORList is kept alive at least as long as the CurrentNode - if you end up with an iterator for an XORList that no longer exists that the current_node attribute will be invalid. current_node is not owned by XORListIterator so no need for a destructor.

The danger with this scheme I think is making sure that if any changes to the XORList don't completely invalidate any existing XORListIterators to the point where you get crashes. I suspect this would also be an issue with your current version.


I suspect the built-in list will still remain competitive, since it is a well-written, efficient structure. Remember that list.append is usually a simple Py_INCREF, with an occasional array reallocation and copy. Yours always involves creation of a new Python object (the Node) as well as some associated reference counting.

My alternative scheme avoids a lot of reference counting (both in terms of computational time and "you having to think about it" time), so I'd expect it to be much closer. It retain the disadvantage of a small memory allocation each append, which is unavoidable for a linked-list structure.


Addendum: to address the comment about "the convenience of a Cython class". In my view the two advantages of using a Cython class vs a struct are:

  1. You get something fairly close to a struct, but don't have to worry about C pointers and the reference counting is taken care of. It's pretty clear that for this problem you're doing odd things to pointers and having to handle reference counting explicitly, so I don't think this is applies to you.
  2. You can use it from Python - you aren't just restricted to Cython. In this case I think it's entirely an implementation detail of the XORList that shouldn't be exposed to Python users.

Therefore I think the main reasons to use Cython classes specifically don't apply to your problem. (For a lot of code the advantages do apply, of course!)

It's probably also worth adding that constructing Cython classes is probably one of their slower features - to support possible inheritance the construction process is quite "indirect". You've managed to create a benchmark that turns out to be almost all constructing - I'd guess it's a slightly skewed benchmark and the real case might not be that bad.

Post a Comment for "Cython Implementation No Faster Than Pure Python"