Cython Implementation No Faster Than Pure Python
Solution 1:
The three culprits from your profiling look to be Node's __init__
(which is unavoidable here), and __get__
and __set__
for the prev_xor_next
property. My view is that you don't want the prev_xor_next
property (or if you do it should be read-only) since it makes what should be a Cython internal accessible in Python.
Whether you delete the property or not, you are working in Cython here so you can write directly to the underlying C attribute _prev_xor_next
. You may need to set cdef Node cur_last
at the start of append
(and maybe in other functions) to ensure that Cython knows the type of cur_last
- I think it should be able to work it out but if you get AttributeErrors
at runtime then this is what you need to do.
This change gives me a 30% speed increase (i.e. it's still slower than a regular list, but it's a noticeable improvement).
I'll outline a more drastic change that I possibly should have suggested on your first question about this problem. This really is a vague outline so no effort has been made to get it to work...
Node
is entirely internal to yourXORList
class: it should not be used from Python and the lifetime of all theNodes
in anXORList
is tied directly to the list. Therefore they should be destructed on the destruction of their owningXORList
(or if the list shrinks, etc) and so do not need to be reference counted. ThereforeNode
should be a C struct rather than a Python object:cdef structNode: uintptr_t prev_xor_next PyObject* val # with associated constructor- and destructor-like functions:cdef Node* make_node(object val, uintptr_t prev_xor_next): cdef Node* n = <Node*>malloc(sizeof(Node)) n.val = <PyObject*>val Py_XINCREF(n.val) n.prev_xor_next = prev_xor_next return n cdef voiddestroy_node(Node* n): Py_XDECREF(n.val) free(n)
XORList
needs a__dealloc__
function that loops through the list callingdestroy_node
on eachNode
(it needs a__dealloc__
function anyway in your version too!)CurrentNode
needs to remain a Cython class, since this is your "iterator" interface. It can obviously no longer inherit fromNode
. I'd change it to:cdefclass XORListIterator: cdefNode* current_node cdefXORList our_list
the point of the attribute
our_list
is to ensure that theXORList
is kept alive at least as long as theCurrentNode
- if you end up with an iterator for anXORList
that no longer exists that thecurrent_node
attribute will be invalid.current_node
is not owned byXORListIterator
so no need for a destructor.
The danger with this scheme I think is making sure that if any changes to the XORList
don't completely invalidate any existing XORListIterators
to the point where you get crashes. I suspect this would also be an issue with your current version.
I suspect the built-in list
will still remain competitive, since it is a well-written, efficient structure. Remember that list.append
is usually a simple Py_INCREF
, with an occasional array reallocation and copy. Yours always involves creation of a new Python object (the Node
) as well as some associated reference counting.
My alternative scheme avoids a lot of reference counting (both in terms of computational time and "you having to think about it" time), so I'd expect it to be much closer. It retain the disadvantage of a small memory allocation each append
, which is unavoidable for a linked-list structure.
Addendum: to address the comment about "the convenience of a Cython class". In my view the two advantages of using a Cython class vs a struct are:
- You get something fairly close to a struct, but don't have to worry about C pointers and the reference counting is taken care of. It's pretty clear that for this problem you're doing odd things to pointers and having to handle reference counting explicitly, so I don't think this is applies to you.
- You can use it from Python - you aren't just restricted to Cython. In this case I think it's entirely an implementation detail of the
XORList
that shouldn't be exposed to Python users.
Therefore I think the main reasons to use Cython classes specifically don't apply to your problem. (For a lot of code the advantages do apply, of course!)
It's probably also worth adding that constructing Cython classes is probably one of their slower features - to support possible inheritance the construction process is quite "indirect". You've managed to create a benchmark that turns out to be almost all constructing - I'd guess it's a slightly skewed benchmark and the real case might not be that bad.
Post a Comment for "Cython Implementation No Faster Than Pure Python"