Cython Implementation No Faster Than Pure Python
Solution 1:
The three culprits from your profiling look to be Node's __init__ (which is unavoidable here), and __get__ and __set__ for the prev_xor_next property. My view is that you don't want the prev_xor_next property (or if you do it should be read-only) since it makes what should be a Cython internal accessible in Python.
Whether you delete the property or not, you are working in Cython here so you can write directly to the underlying C attribute _prev_xor_next. You may need to set cdef Node cur_last at the start of append (and maybe in other functions) to ensure that Cython knows the type of cur_last - I think it should be able to work it out but if you get AttributeErrors at runtime then this is what you need to do.
This change gives me a 30% speed increase (i.e. it's still slower than a regular list, but it's a noticeable improvement).
I'll outline a more drastic change that I possibly should have suggested on your first question about this problem. This really is a vague outline so no effort has been made to get it to work...
Nodeis entirely internal to yourXORListclass: it should not be used from Python and the lifetime of all theNodesin anXORListis tied directly to the list. Therefore they should be destructed on the destruction of their owningXORList(or if the list shrinks, etc) and so do not need to be reference counted. ThereforeNodeshould be a C struct rather than a Python object:cdef structNode: uintptr_t prev_xor_next PyObject* val # with associated constructor- and destructor-like functions:cdef Node* make_node(object val, uintptr_t prev_xor_next): cdef Node* n = <Node*>malloc(sizeof(Node)) n.val = <PyObject*>val Py_XINCREF(n.val) n.prev_xor_next = prev_xor_next return n cdef voiddestroy_node(Node* n): Py_XDECREF(n.val) free(n)XORListneeds a__dealloc__function that loops through the list callingdestroy_nodeon eachNode(it needs a__dealloc__function anyway in your version too!)CurrentNodeneeds to remain a Cython class, since this is your "iterator" interface. It can obviously no longer inherit fromNode. I'd change it to:cdefclass XORListIterator: cdefNode* current_node cdefXORList our_listthe point of the attribute
our_listis to ensure that theXORListis kept alive at least as long as theCurrentNode- if you end up with an iterator for anXORListthat no longer exists that thecurrent_nodeattribute will be invalid.current_nodeis not owned byXORListIteratorso no need for a destructor.
The danger with this scheme I think is making sure that if any changes to the XORList don't completely invalidate any existing XORListIterators to the point where you get crashes. I suspect this would also be an issue with your current version.
I suspect the built-in list will still remain competitive, since it is a well-written, efficient structure. Remember that list.append is usually a simple Py_INCREF, with an occasional array reallocation and copy. Yours always involves creation of a new Python object (the Node) as well as some associated reference counting.
My alternative scheme avoids a lot of reference counting (both in terms of computational time and "you having to think about it" time), so I'd expect it to be much closer. It retain the disadvantage of a small memory allocation each append, which is unavoidable for a linked-list structure.
Addendum: to address the comment about "the convenience of a Cython class". In my view the two advantages of using a Cython class vs a struct are:
- You get something fairly close to a struct, but don't have to worry about C pointers and the reference counting is taken care of. It's pretty clear that for this problem you're doing odd things to pointers and having to handle reference counting explicitly, so I don't think this is applies to you.
- You can use it from Python - you aren't just restricted to Cython. In this case I think it's entirely an implementation detail of the
XORListthat shouldn't be exposed to Python users.
Therefore I think the main reasons to use Cython classes specifically don't apply to your problem. (For a lot of code the advantages do apply, of course!)
It's probably also worth adding that constructing Cython classes is probably one of their slower features - to support possible inheritance the construction process is quite "indirect". You've managed to create a benchmark that turns out to be almost all constructing - I'd guess it's a slightly skewed benchmark and the real case might not be that bad.
Post a Comment for "Cython Implementation No Faster Than Pure Python"