Issue 1462796: Save the hash value of tuples

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/43144

classification

Title:	Save the hash value of tuples
Type:		Stage:
Components:	None	Versions:

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	georg.brandl, josiahcarlson, noamr, tim.peters
Priority:	normal	Keywords:	patch

Created on 2006-04-01 20:59 by noamr, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
savetuplehash.diff	noamr, 2006-04-01 21:00

Messages (7)
msg49935 - (view)	Author: Noam Raphael (noamr) *	Date: 2006-04-01 20:59
(Copied from a post to python-dev): I've found out that the hash value of tuples isn't saved after it's calculated. With strings it's different: the hash value of a string is calculated only on the first call to hash(string), and saved in the structure for future use. Saving the value makes dict lookup of tuples an operation with an amortized cost of O(1). Saving the hash value means that if an item of the tuple changes its hash value, the hash value of the tuple won't be changed. I think it's ok, since: 1. Hash value of things shouldn't change. 2. Dicts assume that too. I tried the change, and it turned out that I had to change cPickle a tiny bit: it uses a 2-tuple which is allocated when the module initializes to lookup tuples in a dict. I changed it to properly use PyTuple_New and Py_DECREF, and now the complete test suite passes. I run test_cpickle before the change and after it, and it took the same time (0.89 seconds on my computer).
msg49936 - (view)	Author: Tim Peters (tim.peters) *	Date: 2006-04-01 23:00
Logged In: YES user_id=31435 I don't want this: it increases the size of all tuple objects without obvious (let alone compelling) benefit. All strings are hashable, but not all tuples are. Python internals arrange to force string interning in many speed-sensitive cases, but only the empty tuple is interned and hashing the empty tuple goes fast anyway; that makes it less likely that a cached tuple hash will ever do any good. Finally, len(tuple) is typically small so the time needed for hash(tuple) is also typically small (for tuples and strings, the hash time is proportional to the number of elements, and strings are typically "bigger" this way).
msg49937 - (view)	Author: Noam Raphael (noamr) *	Date: 2006-04-01 23:20
Logged In: YES user_id=679426 What about changing cPickle to avoid its dependency on tuples not saving their hash value?
msg49938 - (view)	Author: Josiah Carlson (josiahcarlson) *	Date: 2006-04-02 04:15
Logged In: YES user_id=341410 "Saving the value makes dict lookup of tuples an operation with an amortized cost of O(1)." This is an incorrect statement in regards to caching. The initial hashing costs whatever it takes to hash the tuple ( Omega(len(tuple)) time at least), but subsequent hashes cost O(1). Ammortized over 2 hashes is still Omega(len(tuple)) per hash, not O(1) as you stated. In order for ammortization to help the bounds, you would need to hash the tuple Omega(len(tuple)) times (assuming the contents of the tuple are hasable in O(1) time without ammortization), at which point you would get O(1) ammortization.
msg49939 - (view)	Author: Josiah Carlson (josiahcarlson) *	Date: 2006-04-02 04:17
Logged In: YES user_id=341410 I should also mention that the point of ammortized analysis is to properly bound time based on an expected number of operations. You are not doing this, which is why your ammortization is incorrect. You are right that it would be constant after you cache the hash, but it is not constant ammortized in general.
msg49940 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2006-04-02 20:48
Logged In: YES user_id=849994 As per Guido's -1, rejecting this patch.
msg49941 - (view)	Author: Noam Raphael (noamr) *	Date: 2006-04-02 22:00
Logged In: YES user_id=679426 I just wanted to say that I believe that my analysis is correct: you can count the O(n) of the firsh hash with the creation of the tuple, which is obviously O(n). So if you count calculating the hash value as O(1), you'll end up with the correct O() for the complete program, even though the first hash isn't really O(1).

History
Date	User	Action	Args
2022-04-11 14:56:16	admin	set	github: 43144
2006-04-01 20:59:25	noamr	create