Most Suitable Data Structure To Support Delete From Left And Delete From Arbitrary Index Where Delete From Left Is Twice As Much Frequent
I am writing a function that takes a large list of strings and classifies them. Each string classified is deleted from the list one after another. Sometimes the function needs to d
Solution 1:
Try an OrderedDict
with your strings as keys.
Benchmark with just 30,000 strings, 20,000 deletions from left and 10,000 "arbitrary" removals:
1476 ms1490 ms1491 msusing_list10ms11ms11msusing_OrderedDict
Instead of
strings_list.pop(0)
strings_list.remove(value)
use
strings_dict.popitem(False)
strings_dict.pop(value)
This assumes you have no duplicate strings, which seems likely. If you do, then use the dictionary values as the frequencies.
Benchmark code (Try it online!):
from timeit import timeit
from collections import OrderedDict
from random import randrange
def using_list(strings, removes):
for string in removes:
strings.pop(0)
strings.pop(0)
strings.remove(string)
def using_OrderedDict(strings, removes):
strings = OrderedDict.fromkeys(strings)
for string in removes:
strings.popitem(False)
strings.popitem(False)
strings.pop(string)
# Build testcase with 30,000 elements
strings = list(map(str, range(10_000, 40_000)))
copy = strings.copy()
removes = []
for _ in range(10_000):
copy.pop(0)
copy.pop(0)
i = randrange(len(copy))
removes.append(copy.pop(i))
# Benchmark
for _ in range(3):
for func in using_list, using_OrderedDict:
times = []
for _ in range(3):
copy = strings.copy()
t = timeit(lambda: func(copy, removes), number=1)
times.append(t)
times.sort()
print(*('%4d ms ' % (t * 1e3) for t in times), func.__name__)
print()
Solution 2:
If you know where you exactly want to delete, you might be looking for a linked list implementation
Do note that deletion from a linked list is O(1)
However, searching for an element is O(n)
EDIT: If you instead care about values to be deleted, a set is probably what you're looking for. Do note that set is an unordered datastructure and would only allow for values to be deleted
Post a Comment for "Most Suitable Data Structure To Support Delete From Left And Delete From Arbitrary Index Where Delete From Left Is Twice As Much Frequent"