Skip to content Skip to sidebar Skip to footer

Cython String Concatenation Is Super Slow; What Else Does It Do Poorly?

I have a large Python code base which we recently started compiling with Cython. Without making any changes to the code, I expected performance to stay about the same, but we plan

Solution 1:

Worth reading: Pep 0008 > Programming Recommendations:

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).

For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b . This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations that don't use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

Reference: https://www.python.org/dev/peps/pep-0008/#programming-recommendations


Solution 2:

Repeated string concatenation of that form is usually frowned upon; some interpreters optimize for it anyway (secretly overallocating and allowing mutation of technically immutable data types in cases where it's known to be safe), but Cython is trying to hard code some things, which makes that harder.

The real answer is "Don't concatenate immutable types over and over." (it's wrong everywhere, just worse in Cython). A perfectly reasonable approach Cython would likely handle fine is to make a list of the individual str, and then call ''.join(listofstr) at the end to make the str at once.

In any event, you're not giving Cython any typing information to work with, so the speed ups aren't going to be very impressive. Try to help it out with the easy stuff, and the speed ups there may more than make up for losses elsewhere. For example, cdef your loop variable and using ''.join might help here:

cpdef str2():
    cdef int i
    val = []
    for i in xrange(100000):  # Maybe range; Cython docs aren't clear if xrange optimized
        val.append('a')
    val = ''.join(val)

Post a Comment for "Cython String Concatenation Is Super Slow; What Else Does It Do Poorly?"