Skip to content Skip to sidebar Skip to footer

Python: Are `hash` Values For Built-in Numeric Types, Strings Standardised?

I came to this question while pondering about the ordering of set, frozenset and dict. Python doesn't guarantee any ordering, and any ordering is coupled to the hash value at some

Solution 1:

The hash values for strings and integers are absolutely not standardized. They could change with any new implementation of Python, including between 2.6.1 and 2.6.2, or between a Mac and a PC implementation of the same version, etc.

More importantly, though, stable hash values doesn't imply repeatable iteration order. You cannot depend on the ordering of values in a set, ever. Even within one process, two sets can be equal and not return their values in the same order. This can happen if one set has had many additions and deletions, but the other has not:

>>>a = set()>>>for i inrange(1000000): a.add(str(i))...>>>for i inrange(6, 1000000): a.remove(str(i))...>>>b = set()>>>for i inrange(6): b.add(str(i))...>>>a == b
True
>>>list(a)
['1', '5', '2', '0', '3', '4']
>>>list(b)
['1', '0', '3', '2', '5', '4']

Solution 2:

As proof that ordering is NOT preserved, consider the example by DKGasser. When run in CPython, this is the result:

>>> test = ['cat', 'dog', 'mouse', 'rat', 6126, 516]
>>> temp = []
>>> for x inset(test):
        temp.append(x)  
>>> temp
[516, 'dog', 6126, 'cat', 'rat', 'mouse']

When run in Jython, this is the result:

>>> test = ['cat', 'dog', 'mouse', 'rat', 6126, 516]
>>> temp = []
>>> for x inset(test):
        temp.append(x)  
>>> temp
[6126, 'dog', 'cat', 'rat', 516, 'mouse']

Q.E.D.

It is entirely dependent upon the interpreter's implementation, and not at all guaranteed by the language itself.

EDIT

Apologies for beating this into the ground, but the OP seems to want definitive "straight from the horse's mouth" proof that ordering cannot be guaranteed. I finally found it:

http://docs.python.org/library/stdtypes.html#dict

CPython implementation detail: Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions.

So there you have it. Please let's be done with this now.

Solution 3:

Speaking from the general idea of a hash set, you can't rely on the order. Even if the implementation you are using happens to preserve order, it's a bad idea to rely on that unless the documentation specifically says that you can.

The fact that the hash values for all objects being placed into the set are guaranteed to always be the same is irrelevant to whether or not the set implementation preserves order.

For a simple hash implementation, a common simple way to go about it is to create an array of size ORIGINAL_SIZE. When an item is inserted, it's hash value is generated and then mapped (via mod for simplicity) to a value range the size of the array, and then the object is placed at that spot in the array. If there's already an item at that spot (ie the array is smaller than the number of possible items), then some collision algorithm is used.

When the number of items in the set implementation changes, the underlying implementation may change the size of the array storing the data (ex, to ORIGINAL_SIZE * 1.5). When this happens, the order of items under iteration will very likely change. This generally only happens for inserts, but can happen for deletes, or even if the implementation spreads out such activities over other operations.

There are a number of set implementations in various languages that guarantee ordering, and some that guarantee that it will be the same order the items are inserted in and what happens to the order when you insert the same item twice (ie, does it move to the end, etc). However, unless the implementation you're looking at specifically says it guarantees that, you cannot rely on it.

As a specific case imagine that, on the next release of Python, it is determined that the underlying code for sets is inefficient. Somebody decides that they will rewrite it to make it much faster. Even if the old implementation happened to preserve order... if the documentation doesn't say it does then the new implementation is free to not have that property.

Solution 4:

AFAIK, the result of __hash__() should always the unique for that object. In the case of integers, the hash is the value itself.

According to the documentation:

object.hash(self)

Called by built-in function hash() and for operations on members of hashed collections including set, frozenset, and dict. hash() should return an integer. The only required property is that objects which compare equal have the same hash value; it is advised to somehow mix together (e.g. using exclusive or) the hash values for the components of the object that also play a part in comparison of objects.

So the order of your objects will always depend on the particular implementation of the hash method for that object and whether it returns something that "makes sense" for comparison is completely determined by you, on custom objects.

TL;DR - Yes, the hash will determine the order of your objects. The order will of course depend on the results given by the hashes or those objects.

Solution 5:

The hash() function of python does a predefined set of operations to come up with its value. What those operations are is further explained here: A given object (string, integer, whatever) will always yield the same hash value.

When you put items into a set (or similar structure), they are rehashed whenever the size of the set reaches a certain threshold. Thus, while you may be unable to predict what order a certain set of items would be in, the samen items will always be in the same order in a set.

Thus, effectively yes... a,b,c,d,e,f,g, where each is a specific string or integer, would always appear in the same order when iterated through in a set. (though, not necessarily the order I just listed them).

EDIT: Edited for clarity based on comments.

EDIT: Console Proof

Ran under python 2.5 on Debian 32bit, python 3 on 64bit and 2.7 on Windows XP 32bit.. comes out the same in all of them, and I've used the fact in programs before with no problems.

Thanks to Chris for the additional platforms to confirm test.

>>> test = ['cat', 'dog', 'mouse', 'rat', 6126, 516]
>>> temp = []
>>> for x inset(test):
    temp.append(x)  
>>> temp
[516, 'dog', 6126, 'cat', 'rat', 'mouse']
>>> temp = []
>>> for x inset(test):
        temp.append(x)
>>> temp
[516, 'dog', 6126, 'cat', 'rat', 'mouse']
>>> 

Post a Comment for "Python: Are `hash` Values For Built-in Numeric Types, Strings Standardised?"