Issue 671454: Optimize dictionary resizing

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/37808

classification

Title:	Optimize dictionary resizing
Type:		Stage:
Components:	Interpreter Core	Versions:	Python 2.3

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:	nnorwitz	Nosy List:	nnorwitz, rhettinger
Priority:	normal	Keywords:	patch

Created on 2003-01-20 23:08 by rhettinger, last changed 2022-04-10 16:06 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
dict.diff	rhettinger, 2003-01-20 23:09	One line patch!
timemat.py	rhettinger, 2003-01-20 23:10	Time the test of a matrix package
timepuzzle.py	rhettinger, 2003-01-20 23:11	Time a puzzle solving graph traverser
timebltin.py	rhettinger, 2003-01-20 23:13	Time access to builtins

Messages (3)
msg42502 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2003-01-20 23:08
Revisiting Oren’s fastnames patch, I think a simpler approach is warranted. Using a setitem macro skips a function call but doesn’t get to the root of problem. Likewise, a pointer search is not helpful because lookupstring() already attempts a pointer comparison first and a hash comparison as a fallback (also, a pointer search fails to find non- interned comparisons). There is more promise in optimizing the search/fail/fallback sequence of looking for locals, globals, and builtins, but the approach of using custom look-ups and having negative entries is too invasive. The real issue is that the dictionaries are too full. Whenever, it hits 2/3’s full, a resize is triggered which doubles the size of the dictionary, still leaving it relatively full and relatively close to the next (expensive) resize operation. This patch makes the setitem resize operation quadruple the size of the dictionary. This leads to fewer total resize operations and leaves the dictionary more sparsely populated (not only leading to fewer collisions, but allowing immediate detection of NOT IN without doing any comparisons of pointers or hashcodes). The patch is minimally invasive. It is only one line! However, is does change the order of dictionaries, so doctest style test code will sense the change immediately. Four timing suites are attached. Two, directly measure speed-ups for accessing globals and __builtins__. The other two are real world apps which tap into all facets of python including subclassing, lambdas and functionals, large and small dicts, a wide array of data structures, heavy processing of strings, tuples, and numbers. Each of the suites shows improvements ranging from 5% to 25%. PyStone also shows a marginal improvement which is surprising because it is dominated by fast locals.
msg42503 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2003-02-12 01:59
Logged In: YES user_id=80475 I timed many different dictionary tune-ups and found that the optimal combination varies depending on the benchmark and needs of the application (uniquifying is store intensive and membership testing is retrieve intensive). The results also vary from processor to processor, and they depend on the amount/type of cache memory available. The tuning options include varying the dict resize trigger ratio from 2/3, altering the size of smalldict, quadrupling space upon resize, and special casing smaller dictionaries. At some point, I'll post the best of these and everyone can try it with their own apps and machine. All of the above parameters can be set arbitrarily and there is no reason to think that the current settings are the optimum for most time critical applications. The attached patch is simple and gives some improvement to many applications. I hesitate to push it forward with more independent timings in different environments.
msg42504 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2003-05-05 22:30
Logged In: YES user_id=80475 Discussed on python-dev. Added a limit at 50k entries. Applied as Objects/dictobject.c 2.145. Documented remaining research Objects/dictnotes.txt

History
Date	User	Action	Args
2022-04-10 16:06:09	admin	set	github: 37808
2003-01-20 23:08:55	rhettinger	create