Revisiting Oren’s fastnames patch, I think a simpler
approach is warranted. Using a setitem macro skips
a function call but doesn’t get to the root of problem.
Likewise, a pointer search is not helpful because
lookupstring() already attempts a pointer
comparison first and a hash comparison as a
fallback (also, a pointer search fails to find non-
interned comparisons). There is more promise in
optimizing the search/fail/fallback sequence of
looking for locals, globals, and builtins, but the
approach of using custom look-ups and having
negative entries is too invasive.
The real issue is that the dictionaries are too full.
Whenever, it hits 2/3’s full, a resize is triggered which
doubles the size of the dictionary, still leaving it
relatively full and relatively close to the next
(expensive) resize operation.
This patch makes the setitem resize operation
quadruple the size of the dictionary. This leads to
fewer total resize operations and leaves the dictionary
more sparsely populated (not only leading to fewer
collisions, but allowing immediate detection of NOT IN
without doing *any* comparisons of pointers or
hashcodes).
The patch is minimally invasive. It is only one line!
However, is does change the order of dictionaries, so
doctest style test code will sense the change
immediately.
Four timing suites are attached. Two, directly
measure speed-ups for accessing globals and
__builtins__. The other two are real world apps
which tap into all facets of python including
subclassing, lambdas and functionals, large and
small dicts, a wide array of data structures, heavy
processing of strings, tuples, and numbers. Each of
the suites shows improvements ranging from 5% to
25%. PyStone also shows a marginal improvement
which is surprising because it is dominated by fast
locals.
|