Skip to content Skip to sidebar Skip to footer

Most Suitable Data Structure To Support Delete From Left And Delete From Arbitrary Index Where Delete From Left Is Twice As Much Frequent

I am writing a function that takes a large list of strings and classifies them. Each string classified is deleted from the list one after another. Sometimes the function needs to d

Solution 1:

Try an OrderedDict with your strings as keys.

Benchmark with just 30,000 strings, 20,000 deletions from left and 10,000 "arbitrary" removals:

1476 ms1490 ms1491 msusing_list10ms11ms11msusing_OrderedDict

Instead of

strings_list.pop(0)
strings_list.remove(value)

use

strings_dict.popitem(False)
strings_dict.pop(value)

This assumes you have no duplicate strings, which seems likely. If you do, then use the dictionary values as the frequencies.

Benchmark code (Try it online!):

from timeit import timeit
from collections import OrderedDict
from random import randrange

def using_list(strings, removes):
    for string in removes:
        strings.pop(0)
        strings.pop(0)
        strings.remove(string)

def using_OrderedDict(strings, removes):
    strings = OrderedDict.fromkeys(strings)
    for string in removes:
        strings.popitem(False)
        strings.popitem(False)
        strings.pop(string)

# Build testcase with 30,000 elements
strings = list(map(str, range(10_000, 40_000)))
copy = strings.copy()
removes = []
for _ in range(10_000):
    copy.pop(0)
    copy.pop(0)
    i = randrange(len(copy))
    removes.append(copy.pop(i))

# Benchmark
for _ in range(3):
    for func in using_list, using_OrderedDict:
        times = []
        for _ in range(3):
            copy = strings.copy()
            t = timeit(lambda: func(copy, removes), number=1)
            times.append(t)
        times.sort()
        print(*('%4d ms ' % (t * 1e3) for t in times), func.__name__)
    print()

Solution 2:

If you know where you exactly want to delete, you might be looking for a linked list implementation

Do note that deletion from a linked list is O(1)

However, searching for an element is O(n)

EDIT: If you instead care about values to be deleted, a set is probably what you're looking for. Do note that set is an unordered datastructure and would only allow for values to be deleted

Post a Comment for "Most Suitable Data Structure To Support Delete From Left And Delete From Arbitrary Index Where Delete From Left Is Twice As Much Frequent"