Skip to content Skip to sidebar Skip to footer

What Is The Overhead Of Using A Dictionary Instead Of A List?

I have a situation in one of my projects that I can either use lists or dictionaries and I am having hard time picking which one to use. I am analyzing large number of items (>4

Solution 1:

In direct answer to your question: dictionaries have significantly more overhead than lists:

  1. Each item consumes memory for both key and value, in contrast to only values for lists.
  2. Adding or removing an item requires consulting a hash table.

Despite the fact that Python dictionaries are extremely well-designed and surprisingly fast, if you have an algorithm that can use direct index, you will save space and time.

However, from the sound of your question and subsequent discusion, it sounds like your needs may change over time and you have some uncertainty ("However, I can think of some situations that might arise in future that I will need to keep some items for keys which are not integers")

If this is the case, I suggest creating a hybrid data structure of your own so that as your needs evolve you can address the efficiency of storage in an isolated place while allowing your application to use simple, readable code to store and retrieve objects.

For example, here is a Python3 class called maybelist that is derived from a list, but detects the presence of non-numeric keys, storing exceptions in a dictionary while providing mappings for some common list operations:

classmaybelist(list):

    def__init__(self, *args):
        super().__init__(*args)
        self._extras = dict()

    def__setitem__(self, index, val):
        try:
            super().__setitem__(index, val)
            returnexcept TypeError:
            # Index is not an integer, store in dict
            self._extras[index] = val
            returnexcept IndexError:
            pass
        distance = index - len(self)
        if distance > 0:
            # Put 'None' in empty slots if need be
            self.extend((None,) * distance)
        self.append(val)

    def__getitem__(self, index):
        try:
            returnsuper().__getitem__(index)
        except TypeError:
            return self._extras[index]

    def__str__(self):
        returnstr([item for item in self])

    def__len__(self):
        returnsuper().__len__() + len(self._extras)

    def__iter__(self):
        for item in itertools.chain(super().__iter__(), self._extras):
            yield item

So, you could treat it like an array, and have it auto expand:

>>> x = maybelist()
>>> x[0] = 'first'>>> x[1] = 'second'>>> x[10] = 'eleventh'>>> print(x)
['first', 'second', None, None, None, None, None, None, None, None, 'eleventh']
>>> print(x[10])
eleventh

Or you could add items with non-numeric keys if they were present:

>>>x['unexpected'] = 'something else'>>>print(x['unexpected'])
something else

And yet have the object appear to behave properly if you access it using iterators or other methods of your choosing:

>>> print(x)
['first', 'second', None, None, None, None, None, None, None, None, 'eleventh', 'unexpected']
>>> print(len(x))
12

This is just an example, and you would need to tailor such a class to meet the needs of your application. For example, the resulting object does not strictly behave like a list (x[len(x)-1] is not the last item, for example). However, your application may not need such strict adherence, and if you are careful and plan properly, you can create an object which both provides highly optimized storage while leaving room for evolving data structure needs in the future.

Solution 2:

dict uses a lot more memory that a list. Probably not enough to be a concern if the computer isn't very busy. There are exceptions of course - if it's a web server with 100 connections per second, you may want to consider saving memory at the expense of readability

>>>L = range(400000)>>>sys.getsizeof(L)
3200072   # ~3 Megabytes
>>>D = dict(zip(range(400000), range(400000)))>>>sys.getsizeof(D)
25166104  # ~25 Megabytes

Solution 3:

Lists are what they seem - a list of values, but in a dictionary, you have an 'index' of words, and for each of them a definition.

Dictionaries are the same, but the properties of a dict are different than lists because they work with mapping keys to values. That means you use a dictionary when:

  • You have to retrieve things based on some identifier, like names, addresses, or anything that can be a key.
  • You don't need things to be in order. Dictionaries do not normally have any notion of order, so you have to use a list for that.
  • You are going to be adding and removing elements and their keys.

Efficiency constrains are discussed at Stack posts Link1 & Link2.

Go for a dictionary as you have doubts regarding future values also there is no memory constrains to bother

Reference

Solution 4:

Not exactly the spot on answer for your not so clear question, but here are my thoughts:

You said

I am analyzing large number of items (>400k)

In that case, I'd advise you to use generators and/or process your date in chunks.

Better option would be to put your data, which are key-value pairs, in Redis and take out chunks of it at a time. Redis can handle your volume of data very easily.

You could write a script that processes one chunk at a time, and using the asyncio module, you could parallelize the chunk processing.

Something like this:

from concurrent import futures

    defchunk_processor(data):
        """
        Process your list data here
        """passdefparallelizer(map_func, your_data_list, n_workers=3):
        with futures.ThreadPoolExecutor(max_workers=n_workers) as executor:
            for result in executor.map(map_func, your_data_list):
                  # Do whatever with your result# Do the take out chunks of your data from Redis here
    chunk_of_list = get_next_chunk_from_redis()

    # Your processing starts here
    parallelizer(chunk_processor, your_data_list)

Again, something better could be done, but I'm presenting you one of the ways to go about it.

Post a Comment for "What Is The Overhead Of Using A Dictionary Instead Of A List?"