Skip to content Skip to sidebar Skip to footer

Avoid Deepcopy Due To Performance

I have a list with another 3 lists in it. I need to do some operations on the lists, as i have like 180.000 of those the step of deepcopying already takes 2,5s to copy the lists on

Solution 1:

deepcopy seems to have some overhead for checking all those different cases it is able to handle. If your lists are always lists of lists (one level of nesting), then you can try to use just s = [list(x) for x in s] or s = list(map(list, s)). Both seem to be quite a bit faster:

In [9]: s = [[random.randint(1, 1000) for _ in range(100)] for _ in range(100)]
In [10]: %timeit copy.deepcopy(s)
10 loops, best of 3: 22.7 ms per loop
In [11]: %timeit [list(x) for x in s]
10000 loops, best of 3: 123 µs per loop
In [18]: %timeit list(map(list, s))
10000 loops, best of 3: 111 µs per loop

Alternatively, depending on your application, it might be better not to copy and store the (modified) lists themselves, but just the modification, either in the form of the modified cells, or as a command stack.

Solution 2:

Okay, so copy.deepcopy is implemented in python, I did a small test script for you question:

# deepcopy.py
import copy
import hotshot, hotshot.stats
import line_profiler


def prof():
    all = []
    for _ in range(180000):
        all.append([
            [1,2,3],
            [1,2,3],
            [1,2,3],
        ])

    prof = hotshot.Profile("deepcopy.py.prof")
    prof.start()
    copy.deepcopy(all)
    prof.stop()
    prof.close()
    stats = hotshot.stats.load('deepcopy.py.prof')
    stats.strip_dirs()
    stats.sort_stats('time', 'calls')
    stats.print_stats(20)

def lineprof():
    all = []
    for _ in range(180000):
        all.append([
            [1,2,3],
            [1,2,3],
            [1,2,3],
        ])

    prof = line_profiler.LineProfiler()
    prof.add_function(copy.deepcopy)
    prof.enable()
    copy.deepcopy(all)
    prof.disable()
    prof.print_stats()

prof()
lineprof()

First I used hotshot to get a sense of what was going on, there doesn't seem much more than deepcopy itself, so I added line profile, here is the result:

3780009 function calls (720009 primitive calls) in3.156 seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 720001/11.4830.0003.1563.156 copy.py:226(_deepcopy_list)
2340001/11.4250.0003.1563.156 copy.py:145(deepcopy)
   7200040.2480.0000.2480.000 copy.py:267(_keep_alive)
        30.0000.0000.0000.000 copy.py:198(_deepcopy_atomic)
        00.0000.000          profile:0(profiler)


Timer unit: 1e-06 s

Total time: 13.6727 s
File: /usr/lib64/python2.7/copy.py
Function: deepcopy at line 145

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   145defdeepcopy(x, memo=None, _nil=[]):
   146"""Deep copy operation on arbitrary Python objects.
   147                                           
   148                                               See the module's __doc__ string for more info.
   149                                               """150151234000116146870.711.8if memo isNone:
   152111.00.0          memo = {}
   153154234000117257590.712.6      d = id(x)
   155234000121237920.915.5      y = memo.get(d, _nil)
   156234000115509400.711.3if y isnot _nil:
   157161999710471210.67.7return y
   1581597200045878380.84.3      cls = type(x)
   1601617200045772710.84.2      copier = _deepcopy_dispatch.get(cls)
   1627200044762860.73.5if copier:
   16372000418137062.513.3          y = copier(x, memo)
   164else:
   165try:
   166                                                       issc = issubclass(cls, type)
   167except TypeError: # cls is not a class (old Boost; see SF #502085)168                                                       issc = 0169if issc:
   170                                                       y = _deepcopy_atomic(x, memo)
   171else:
   172                                                       copier = getattr(x, "__deepcopy__", None)
   173if copier:
   174                                                           y = copier(memo)
   175else:
   176                                                           reductor = dispatch_table.get(cls)
   177if reductor:
   178                                                               rv = reductor(x)
   179else:
   180                                                               reductor = getattr(x, "__reduce_ex__", None)
   181if reductor:
   182                                                                   rv = reductor(2)
   183else:
   184                                                                   reductor = getattr(x, "__reduce__", None)
   185if reductor:
   186                                                                       rv = reductor()
   187else:
   188raise Error(
   189"un(deep)copyable object of type %s" % cls)
   190                                                           y = _reconstruct(x, rv, 1, memo)
   1911927200045791810.84.2      memo[d] = y
   19372000411152581.58.2      _keep_alive(x, memo) # Make sure x lives at least as long as d1947200044608340.63.4return y

So the problem seems that the memoization is not helping and it is wasting way more time in bookkeeping instead of coping, so my suggestion is to exploit the fact that the lists are well defined and do the copy yourself:

import hotshot, hotshot.stats
import line_profiler


def manualcopy(original_list):
    copy = []
    for item in original_list:
        copy.append(
            [
                list(item[0]),
                list(item[1]),
                list(item[2]),
            ]
        )
    return copy


def prof():
    all = []
    for _ in range(180000):
        all.append([
            [1,2,3],
            [1,2,3],
            [1,2,3],
        ])

    prof = hotshot.Profile("deepcopy.py.prof")
    prof.start()
    manualcopy(all)
    prof.stop()
    prof.close()
    stats = hotshot.stats.load('deepcopy.py.prof')
    stats.strip_dirs()
    stats.sort_stats('time', 'calls')
    stats.print_stats(20)

def lineprof():
    all = []
    for _ in range(180000):
        all.append([
            [1,2,3],
            [1,2,3],
            [1,2,3],
        ])

    prof = line_profiler.LineProfiler()
    prof.add_function(manualcopy)
    prof.enable()
    manualcopy(all)
    prof.disable()
    prof.print_stats()

prof()
lineprof()

That will reduce from 3.156 seconds to 0.446 seconds

1function calls in0.446 seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        10.4460.4460.4460.446 manualcopy.py:5(manualcopy)
        00.0000.000          profile:0(profiler)


Timer unit: 1e-06 s

Total time: 0.762817 s
File: manualcopy.py
Function: manualcopy at line 5

Line #      Hits         TimePer Hit   %Time  Line Contents
==============================================================5                                           def manualcopy(original_list):
     6133.00.0copy= []
     7180001626220.38.2for item in original_list:
     8180000668050.48.8          copy.append(
     9                                                       [
    101800001066830.614.0                  list(item[0]),
    111800001373080.818.0                  list(item[1]),
    121800003893962.251.0                  list(item[2]),
    13                                                       ]
    14                                                   )
    15100.00.0returncopy

Post a Comment for "Avoid Deepcopy Due To Performance"