This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Dictionary tuning
Type: Stage:
Components: Interpreter Core Versions: Python 2.3
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: gvanrossum, rhettinger, skip.montanaro
Priority: normal Keywords: patch

Created on 2003-04-29 08:03 by rhettinger, last changed 2022-04-10 16:08 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
dictobject.diff rhettinger, 2003-04-29 08:03 One line patch to dictobject.c
dict2.diff gvanrossum, 2003-04-29 10:49 Five-line patch
Messages (10)
msg43535 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2003-04-29 08:03
The idea is to make dictionaries more sparse on 
average (decreasing the density between 0 and 50%). 
This results in fewer collisions, faster collision 
resolution, fewer memory accesses, and better cache 
performance.  A small side-benefit is halving the 
number of expensive resize operations as the 
dictionary grows and reducing the cost of the resize 
operations.

The patch helps both smaller sized dictionaries and 
large ones.  It entails a one line change to dictobject.c 
resulting in a new schedule of dictionary sizes for a 
given number of entries:

Number of           Current size        Proposed size
Filled Entries      of dictionary       of dictionary
--------------      -------------       -------------
0 to 5                    8                   8
6 to 10                  16                  32
11 to 21                 32                  32
22 to 42                 64                 128
43 to 85                128                 128
86 to 170               256                 512
171 to 341              512                 512
  and so on until 100,000 entries where the old 
schedule takes over.

The 100,000 entry limit was added because of 
python-dev feedback about memory consumption in 
the presence of many large dictionaries.  At 12 bytes 
per entry on a 32-bit box, the switchback occurs at 
1.2 Mb for a single dict.  This ought to allow hundreds 
of large dictionaries on common memory sizes.

Performance improvements vary from application to 
application and across various hardware setups.  So 
far, typical results are from 0 to 5%.
msg43536 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2003-04-29 10:49
Logged In: YES 
user_id=6380

My preferred version, with code reordering suggested by Tim
and further refinement by me (dict2.diff)
msg43537 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2003-04-29 10:56
Logged In: YES 
user_id=6380

However note that I cannot reproduce the speedup on Zope 3
startup any more. It may have been something else in my
system. :-(
msg43538 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2003-05-05 20:21
Logged In: YES 
user_id=44345

Using Guido's version I saw a consistent improvement in pystone numbers.
I ran it five times each with and without the patch.  Every "after" run was
faster than every "before" run and comparing the fastest run of each showed
a 5.4% improvement (13587 vs. 14326.6 pystones).  I also ran the pystone
benchmark using python -i and then used ps to examine the memory usage
after completion.  VM and RSS both increased modestly (VM by 0.2%, from
27936 to 27984 and 2.2%, RSS from 3520 to 3600).  Not a big deal in my
book, but I'm one of those people w/ 1GB of RAM Tim referred to. ;-)
msg43539 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2003-05-05 20:29
Logged In: YES 
user_id=6380

I say, check it in, it can't hurt. A comment in the code
should explain the considerations though.
msg43540 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2003-05-05 22:23
Logged In: YES 
user_id=80475

Applied to Objects/dictobject.c 2.145
Closing patch.

Thanks to all who contributed, timed, or encouraged.
msg43541 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2003-05-06 00:16
Logged In: YES 
user_id=6380

Should the other calls to dictresize() also be changed?
msg43542 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2003-05-06 00:36
Logged In: YES 
user_id=80475

Most likely, yes, but I would like a day to think about the 
use cases and implications before saying for certain.
msg43543 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2003-05-06 23:46
Logged In: YES 
user_id=80475

Yes, in both places, the resize argument of mp-
>ma_used*3/2 should be changed to mp->ma_used*2.

msg43544 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2003-05-07 00:39
Logged In: YES 
user_id=6380

Ok, do it.
History
Date User Action Args
2022-04-10 16:08:26adminsetgithub: 38397
2003-04-29 08:03:45rhettingercreate