Cython String Concatenation Is Super Slow; What Else Does It Do Poorly?
Solution 1:
Worth reading: Pep 0008 > Programming Recommendations:
Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).
For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b . This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations that don't use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.
Reference: https://www.python.org/dev/peps/pep-0008/#programming-recommendations
Solution 2:
Repeated string concatenation of that form is usually frowned upon; some interpreters optimize for it anyway (secretly overallocating and allowing mutation of technically immutable data types in cases where it's known to be safe), but Cython is trying to hard code some things, which makes that harder.
The real answer is "Don't concatenate immutable types over and over." (it's wrong everywhere, just worse in Cython). A perfectly reasonable approach Cython would likely handle fine is to make a list
of the individual str
, and then call ''.join(listofstr)
at the end to make the str
at once.
In any event, you're not giving Cython any typing information to work with, so the speed ups aren't going to be very impressive. Try to help it out with the easy stuff, and the speed ups there may more than make up for losses elsewhere. For example, cdef
your loop variable and using ''.join
might help here:
cpdef str2():
cdef int i
val = []
for i in xrange(100000): # Maybe range; Cython docs aren't clear if xrange optimized
val.append('a')
val = ''.join(val)
Post a Comment for "Cython String Concatenation Is Super Slow; What Else Does It Do Poorly?"