This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Save the hash value of tuples
Type: Stage:
Components: None Versions:
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: georg.brandl, josiahcarlson, noamr, tim.peters
Priority: normal Keywords: patch

Created on 2006-04-01 20:59 by noamr, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
savetuplehash.diff noamr, 2006-04-01 21:00
Messages (7)
msg49935 - (view) Author: Noam Raphael (noamr) * Date: 2006-04-01 20:59
(Copied from a post to python-dev):

I've found out that the hash value of tuples isn't
saved after it's calculated. With strings it's
different: the hash value of a string is calculated
only on the first call to hash(string), and saved in
the structure for future use. Saving the value makes
dict lookup of tuples an operation with an amortized
cost of O(1).

Saving the hash value means that if an item of the
tuple changes its hash value, the hash value of the
tuple won't be changed. I think it's ok, since:
 1. Hash value of things shouldn't change.
 2. Dicts assume that too.

I tried the change, and it turned out that I had to
change cPickle a tiny bit: it uses a 2-tuple which is
allocated when the module initializes to lookup tuples
in a dict. I changed it to properly use PyTuple_New and
Py_DECREF, and now the complete test suite passes. I
run test_cpickle before the change and after it, and it
took the same time (0.89 seconds on my computer).
msg49936 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2006-04-01 23:00
Logged In: YES 
user_id=31435

I don't want this:  it increases the size of all tuple
objects without obvious (let alone compelling) benefit.  All
strings are hashable, but not all tuples are.  Python
internals arrange to force string interning in many
speed-sensitive cases, but only the empty tuple is interned
and hashing the empty tuple goes fast anyway; that makes it
less likely that a cached tuple hash will ever do any good.
 Finally, len(tuple) is typically small so the time needed
for hash(tuple) is also typically small (for tuples and
strings, the hash time is proportional to the number of
elements, and strings are typically "bigger" this way).
msg49937 - (view) Author: Noam Raphael (noamr) * Date: 2006-04-01 23:20
Logged In: YES 
user_id=679426

What about changing cPickle to avoid its dependency on
tuples not saving their hash value?
msg49938 - (view) Author: Josiah Carlson (josiahcarlson) * (Python triager) Date: 2006-04-02 04:15
Logged In: YES 
user_id=341410

"Saving the value makes dict lookup of tuples an operation
with an amortized cost of O(1)."

This is an incorrect statement in regards to caching.  The
initial hashing costs whatever it takes to hash the tuple (
Omega(len(tuple)) time at least), but subsequent hashes cost
O(1).  Ammortized over 2 hashes is still Omega(len(tuple))
per hash, not O(1) as you stated.  In order for
ammortization to help the bounds, you would need to hash the
tuple Omega(len(tuple)) times (assuming the contents of the
tuple are hasable in O(1) time without ammortization), at
which point you would get O(1) ammortization.
msg49939 - (view) Author: Josiah Carlson (josiahcarlson) * (Python triager) Date: 2006-04-02 04:17
Logged In: YES 
user_id=341410

I should also mention that the point of ammortized analysis
is to properly bound time based on an expected number of
operations.  You are not doing this, which is why your
ammortization is incorrect.  You are right that it would be
constant after you cache the hash, but it is not constant
ammortized in general.
msg49940 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-04-02 20:48
Logged In: YES 
user_id=849994

As per Guido's -1, rejecting this patch.
msg49941 - (view) Author: Noam Raphael (noamr) * Date: 2006-04-02 22:00
Logged In: YES 
user_id=679426

I just wanted to say that I believe that my analysis is
correct: you can count the O(n) of the firsh hash with the
creation of the tuple, which is obviously O(n). So if you
count calculating the hash value as O(1), you'll end up with
the correct O() for the complete program, even though the
first hash isn't really O(1).
History
Date User Action Args
2022-04-11 14:56:16adminsetgithub: 43144
2006-04-01 20:59:25noamrcreate