This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: optimize attribute lookups
Type: Stage:
Components: Interpreter Core Versions: Python 2.3
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: jhylton, nascheme, zooko
Priority: normal Keywords: patch

Created on 2002-01-11 18:07 by zooko, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python.patch zooko, 2002-01-11 18:07
Messages (8)
msg38690 - (view) Author: Zooko O'Whielacronx (zooko) Date: 2002-01-11 18:07
This patch optimizes the string comparisons in
class_getattr(), class_setattr(), instance_getattr1(),
and instance_setattr().

I pulled out the relevant section of class_setattr()
and measured its performance, yielding the following
results:

 * in the case that the argument does *not* begin with
"__", then the new version is 1.03 times as fast as the
old.  (This is a mystery to me, as the path through the
code looks the same, in C.  I examined the assembly
that GCC v3.0.3 generated in -O3 mode, and it is true
that the assembly for the new version is
smaller/faster, although I don't really understand why.)

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and ends with "X_" (where X is a random alphabetic
character), then the new version 1.12 times as fast as
the old.

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and does *not* end with "_", then the new version
1.16 times as fast as the old.

 * in the case that the argument is (randomly) one of
the six special names, then the new version is 2.7
times as fast as the old.

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and ends with "__" (but is not one of the six
special names), then the new version is 3.7 times as
fast as the old.

msg38691 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2002-01-17 18:29
Logged In: YES 
user_id=31392

This seems to add a lot of complexity for a few special
cases.  How important are these particular attributes?  Do
you have any benchmark applications that show real
improvement?  It seems like microbenchmarks overstate the
benefit, since we don't know how often these attributes are
looked up by most applications.

It would also be interesting to see how much of the benefit
for non __ names is the result of the PyString_AS_STRING()
macro.  Maybe that's all the change we really need :-).
msg38692 - (view) Author: Zooko O'Whielacronx (zooko) Date: 2002-01-17 20:33
Logged In: YES 
user_id=52562

Yeah, the optimized version is less readable that the original.

I'll try to come up with a benchmark application.  Any
ideas?  Maybe some unit tests from Zope that use attribute
lookups heavily?

My guess is that the actual results in an application will
be "marginal", like maybe between 0.5% to 3% improvement.

msg38693 - (view) Author: Zooko O'Whielacronx (zooko) Date: 2002-01-18 00:22
Logged In: YES 
user_id=52562

Okay I've done some "mini benchmarks".  The earlier reported
micro-benchmarks were the result of running the inner loop
itself, in C.  These mini benchmarks are the result of
running this Python script:

class A:
    def __init__(self):
        self.a = 0

a = A()
for i in xrange(2**20):
    a.a = i

print a.a

and then using different attribute names in place of `a'.
The results are as expected: the optimized version is faster
than the current one, depending on the shape of the
attribute name, and dampened by the fact that there is now
other work being done.  The case that shows the smallest
difference is when the attribute name neither begins nor
ends with an '_'.  In that case the above script runs about
2% faster with the optimizations.  The case that shows the
biggest difference is when the attribute begins and ends
with '__', as in `__a__'.  Then the above script runs about
15% faster.

This still isn't a *real* application benchmark.  I'm
looking for one that is a reasonable case for real Python
users but that also uses attribute lookups heavily.
msg38694 - (view) Author: Zooko O'Whielacronx (zooko) Date: 2002-03-14 16:24
Logged In: YES 
user_id=52562

update:

I did a real app benchmark of this patch by running one of
the unit tests from 
PyXML-0.6.6.  (Which one?  The one that I guessed would
favor my optimization 
the most.  Unfortunately I've lost my notes and I don't
remember which one.)

I also separated out the "unroll strcmp" optimization from
the "use macros" 
optimization on request.

I have lost my notes, but I recall that my results showed
what I expected: 
between 0.5 and 3 percent app-level speed-up for the unroll
strcmp optimization.

Interesting detail: a quirk in GCC 3 makes the unroll strcmp
version is slightly 
faster than the current strcmp version *even* in the
(common) case that the 
first two characters of the attribute name are *not* '__'.

What should happen next:

1.  Someone who has the authority to approve or reject this
patch should tell me 
what kind of benchmark would be persuasive to you.  I mean:
what specific 
program I can run with and without my patch for a useful
comparison.  (If you 
require more than a 5% app-level speed-up, then let's give
up on this patch now!)

2.  Someone volunteer to test this patch with MSFT compiler,
as I don't have one 
right now.  Some people are still using the Windows
platform, I've noticed [1], 
so it is worth benchmarking.  Actually, someone should
volunteer to benchmark 
GCC+Linux-or-MacOSX, too, as my computer is a laptop with
variable-speed CPU and 
is really crummy for benchmarking.

By the way, PEP 266 is a better solution to the problem but
until it's 
implemented, this patch is the better patch.  ;-)

Note: this is one of those patches that looks uglier in
"diff -u" format than in 
actual source code.  Please browse the actual source
side-by-side [2] to see how 
ugly it really is.

Regards

Zooko

[1] http://www.google.com/press/zeitgeist/jan02-pie.gif
[2] search for "class_getattr" in:
    http://zooko.com/classobject.c
    http://zooko.com/classobject-strcmpunroll.c

---
                 zooko.com
Security and Distributed Systems Engineering
---
msg38695 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2002-03-24 01:57
Logged In: YES 
user_id=35752

Based on the complexity added by the patch I would say
at least a 5% speedup would be needed to offset the
maintainence cost.  -1 on the current patch.
msg38696 - (view) Author: Zooko O'Whielacronx (zooko) Date: 2002-03-24 15:12
Logged In: YES 
user_id=52562

Okay, I just want to double-check these two points:

1.  You did look at the actual resulting source code and not
just the patch, right?  Here's a side-by-side:
http://zooko.com/temp.html

2.  You realize that my promise that the actual speedup is <
5% is in a realistic application-level benchmark.  For
microbenchmarks, the speed-up is various but generally much
higher than 5%, as described in this patch tracker entry.

Given these two facts, then please reject this patch and
spend your time on the new cached attribute lookups
architecture instead.  ;-)

Regards,

Zooko
msg38697 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2002-03-24 18:25
Logged In: YES 
user_id=35752

I've played with your patch for about 2 hours today.  I
benchmarked it, tried to clean it up using macros or
inlined functions.  I also tried a varation that exploited
the fact that most names were interned strings.  It's not
worth it.  Spend time on rattlesnake, pysco, or the
namespace optimizations.
History
Date User Action Args
2022-04-10 16:04:52adminsetgithub: 35907
2002-01-11 18:07:41zookocreate