Avoid Deepcopy Due To Performance
Solution 1:
deepcopy
seems to have some overhead for checking all those different cases it is able to handle. If your lists are always lists of lists (one level of nesting), then you can try to use just s = [list(x) for x in s]
or s = list(map(list, s))
. Both seem to be quite a bit faster:
In [9]: s = [[random.randint(1, 1000) for _ in range(100)] for _ in range(100)]
In [10]: %timeit copy.deepcopy(s)
10 loops, best of 3: 22.7 ms per loop
In [11]: %timeit [list(x) for x in s]
10000 loops, best of 3: 123 µs per loop
In [18]: %timeit list(map(list, s))
10000 loops, best of 3: 111 µs per loop
Alternatively, depending on your application, it might be better not to copy and store the (modified) lists themselves, but just the modification, either in the form of the modified cells, or as a command stack.
Solution 2:
Okay, so copy.deepcopy is implemented in python, I did a small test script for you question:
# deepcopy.py
import copy
import hotshot, hotshot.stats
import line_profiler
def prof():
all = []
for _ in range(180000):
all.append([
[1,2,3],
[1,2,3],
[1,2,3],
])
prof = hotshot.Profile("deepcopy.py.prof")
prof.start()
copy.deepcopy(all)
prof.stop()
prof.close()
stats = hotshot.stats.load('deepcopy.py.prof')
stats.strip_dirs()
stats.sort_stats('time', 'calls')
stats.print_stats(20)
def lineprof():
all = []
for _ in range(180000):
all.append([
[1,2,3],
[1,2,3],
[1,2,3],
])
prof = line_profiler.LineProfiler()
prof.add_function(copy.deepcopy)
prof.enable()
copy.deepcopy(all)
prof.disable()
prof.print_stats()
prof()
lineprof()
First I used hotshot to get a sense of what was going on, there doesn't seem much more than deepcopy itself, so I added line profile, here is the result:
3780009 function calls (720009 primitive calls) in3.156 seconds
Ordered by: internal time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
720001/11.4830.0003.1563.156 copy.py:226(_deepcopy_list)
2340001/11.4250.0003.1563.156 copy.py:145(deepcopy)
7200040.2480.0000.2480.000 copy.py:267(_keep_alive)
30.0000.0000.0000.000 copy.py:198(_deepcopy_atomic)
00.0000.000 profile:0(profiler)
Timer unit: 1e-06 s
Total time: 13.6727 s
File: /usr/lib64/python2.7/copy.py
Function: deepcopy at line 145
Line # Hits Time Per Hit % Time Line Contents
==============================================================
145defdeepcopy(x, memo=None, _nil=[]):
146"""Deep copy operation on arbitrary Python objects.
147
148 See the module's __doc__ string for more info.
149 """150151234000116146870.711.8if memo isNone:
152111.00.0 memo = {}
153154234000117257590.712.6 d = id(x)
155234000121237920.915.5 y = memo.get(d, _nil)
156234000115509400.711.3if y isnot _nil:
157161999710471210.67.7return y
1581597200045878380.84.3 cls = type(x)
1601617200045772710.84.2 copier = _deepcopy_dispatch.get(cls)
1627200044762860.73.5if copier:
16372000418137062.513.3 y = copier(x, memo)
164else:
165try:
166 issc = issubclass(cls, type)
167except TypeError: # cls is not a class (old Boost; see SF #502085)168 issc = 0169if issc:
170 y = _deepcopy_atomic(x, memo)
171else:
172 copier = getattr(x, "__deepcopy__", None)
173if copier:
174 y = copier(memo)
175else:
176 reductor = dispatch_table.get(cls)
177if reductor:
178 rv = reductor(x)
179else:
180 reductor = getattr(x, "__reduce_ex__", None)
181if reductor:
182 rv = reductor(2)
183else:
184 reductor = getattr(x, "__reduce__", None)
185if reductor:
186 rv = reductor()
187else:
188raise Error(
189"un(deep)copyable object of type %s" % cls)
190 y = _reconstruct(x, rv, 1, memo)
1911927200045791810.84.2 memo[d] = y
19372000411152581.58.2 _keep_alive(x, memo) # Make sure x lives at least as long as d1947200044608340.63.4return y
So the problem seems that the memoization is not helping and it is wasting way more time in bookkeeping instead of coping, so my suggestion is to exploit the fact that the lists are well defined and do the copy yourself:
import hotshot, hotshot.stats
import line_profiler
def manualcopy(original_list):
copy = []
for item in original_list:
copy.append(
[
list(item[0]),
list(item[1]),
list(item[2]),
]
)
return copy
def prof():
all = []
for _ in range(180000):
all.append([
[1,2,3],
[1,2,3],
[1,2,3],
])
prof = hotshot.Profile("deepcopy.py.prof")
prof.start()
manualcopy(all)
prof.stop()
prof.close()
stats = hotshot.stats.load('deepcopy.py.prof')
stats.strip_dirs()
stats.sort_stats('time', 'calls')
stats.print_stats(20)
def lineprof():
all = []
for _ in range(180000):
all.append([
[1,2,3],
[1,2,3],
[1,2,3],
])
prof = line_profiler.LineProfiler()
prof.add_function(manualcopy)
prof.enable()
manualcopy(all)
prof.disable()
prof.print_stats()
prof()
lineprof()
That will reduce from 3.156 seconds to 0.446 seconds
1function calls in0.446 seconds
Ordered by: internal time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
10.4460.4460.4460.446 manualcopy.py:5(manualcopy)
00.0000.000 profile:0(profiler)
Timer unit: 1e-06 s
Total time: 0.762817 s
File: manualcopy.py
Function: manualcopy at line 5
Line # Hits TimePer Hit %Time Line Contents
==============================================================5 def manualcopy(original_list):
6133.00.0copy= []
7180001626220.38.2for item in original_list:
8180000668050.48.8 copy.append(
9 [
101800001066830.614.0 list(item[0]),
111800001373080.818.0 list(item[1]),
121800003893962.251.0 list(item[2]),
13 ]
14 )
15100.00.0returncopy
Post a Comment for "Avoid Deepcopy Due To Performance"