This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Optimize dictionary resizing
Type: Stage:
Components: Interpreter Core Versions: Python 2.3
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: nnorwitz Nosy List: nnorwitz, rhettinger
Priority: normal Keywords: patch

Created on 2003-01-20 23:08 by rhettinger, last changed 2022-04-10 16:06 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
dict.diff rhettinger, 2003-01-20 23:09 One line patch!
timemat.py rhettinger, 2003-01-20 23:10 Time the test of a matrix package
timepuzzle.py rhettinger, 2003-01-20 23:11 Time a puzzle solving graph traverser
timebltin.py rhettinger, 2003-01-20 23:13 Time access to builtins
Messages (3)
msg42502 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2003-01-20 23:08
Revisiting Oren’s fastnames patch, I think a simpler 
approach is warranted.  Using a setitem macro skips 
a function call but doesn’t get to the root of problem.  
Likewise, a pointer search is not helpful because 
lookupstring() already attempts a pointer 
comparison first and a hash comparison as a 
fallback (also, a pointer search fails to find non-
interned comparisons). There is more promise in 
optimizing the search/fail/fallback sequence of 
looking for locals, globals, and builtins, but the 
approach of using custom look-ups and having 
negative entries is too invasive.

The real issue is that the dictionaries are too full.  
Whenever, it hits 2/3’s full, a resize is triggered which 
doubles the size of the dictionary, still leaving it 
relatively full and relatively close to the next 
(expensive) resize operation.

This patch makes the setitem resize operation 
quadruple the size of the dictionary.  This leads to 
fewer total resize operations and leaves the dictionary 
more sparsely populated (not only leading to fewer 
collisions, but allowing immediate detection of NOT IN 
without doing *any* comparisons of pointers or 
hashcodes).

The patch is minimally invasive.  It is only one line!  
However, is does change the order of dictionaries, so 
doctest style test code will sense the change 
immediately.

Four timing suites are attached.  Two, directly 
measure speed-ups for accessing globals and 
__builtins__.  The other two are real world apps 
which tap into all facets of python including 
subclassing, lambdas and functionals, large and 
small dicts, a wide array of data structures, heavy 
processing of strings, tuples, and numbers.   Each of 
the suites shows improvements ranging from 5% to 
25%.  PyStone also shows a marginal improvement 
which is surprising because it is dominated by fast 
locals.
msg42503 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2003-02-12 01:59
Logged In: YES 
user_id=80475

I timed many different dictionary tune-ups and found that 
the optimal combination varies depending on the 
benchmark and needs of the application (uniquifying is 
store intensive and membership testing is retrieve 
intensive).  The results also vary from processor to 
processor, and they depend on the amount/type of cache 
memory available.

The tuning options include varying the dict resize trigger 
ratio from 2/3, altering the size of smalldict, quadrupling 
space upon resize, and special casing smaller dictionaries.

At some point, I'll post the best of these and everyone can 
try it with their own apps and machine.  All of the above 
parameters can be set arbitrarily and there is no reason to 
think that the current settings are the optimum for most 
time critical applications.

The attached patch is simple and gives some 
improvement to many applications.  I hesitate to push it 
forward with more independent timings in different 
environments.
msg42504 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2003-05-05 22:30
Logged In: YES 
user_id=80475

Discussed on python-dev.
Added a limit at 50k entries.
Applied as Objects/dictobject.c 2.145.

Documented remaining research
Objects/dictnotes.txt

History
Date User Action Args
2022-04-10 16:06:09adminsetgithub: 37808
2003-01-20 23:08:55rhettingercreate