This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: fast subclasses of builtin types
Type: Stage:
Components: Interpreter Core Versions: Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: gvanrossum, loewis, nnorwitz
Priority: normal Keywords: patch

Created on 2006-12-29 06:01 by nnorwitz, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
fast-subclass.diff nnorwitz, 2006-12-29 06:01 v1
a.c loewis, 2007-01-06 14:54
Messages (6)
msg51628 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-12-29 06:01
This is similar to a patch posted on python-dev a few months ago (or more).  I modified it to also handle subclassing exceptions which should speed up exception handling a bit.  (This was proposed by Guido based on the original patch.)  I also dropped an extra bit that was going to indicate if it was a builtin type or a subclass of a builtin type.
msg51629 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-12-29 06:04
I forgot to mention this patch works by using unused bits in tp_flags.  This saves a function call when checking for a subclass of a builtin type.

There's one funky thing about this patch, the change to Objects/exceptions.c.  I didn't investigate why this was necessary, or more likely I did why when I added it and forgot.  I know that without adding BASE_EXC_SUBCLASS to tp_flags, test_exceptions fails.

msg51630 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-01-04 03:59
This looks fine, but I have some questions about alternative implementations:

- Why does the typical PyFoo_Check() macro first call PyFoo_CheckExact() before calling the fast bit checking macro?  Did you measure that this is in fact faster?  True, it means always a pointer deref, so maybe it is -- but OTOH it is more instructions.

- Why not have a separate bit for each type?  Then you could make the fast macro test for (flags & mask) != 0 instead of testing for (flag & mask) == value.  It would use up all the remaining bits, but I suspect there are some unused (or reusable) bits in lower positions: 1L<<2 is unused (was GC), and 1L<<11 also seems unused.  And bits 18 through 23!  And I'm guessing that INPLACEOPS (1L<<3) isn't all that interesting any more they were introduced in 2.0...  So it really looks like you have plenty of bits.  Of course I don't know if it matters; would be worth it perhaps to look at the machine code.

- Oops, it looks like your comment is off.  You claim to be using bits 24-27, leaving 28-31 free, but in fact you're using bits 28-31!

BTW You're inroducing quite a few lines over 80 chars.  Perhaps cut back a bit?
msg51631 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-01-06 14:24
I made a couple of assembler experiments (see attached a.c), with gcc 4.1 on x86.

A "bit mask enumeration" test (f) compiles into four instructions:

        movl    8(%eax), %eax
        andl    $-268435456, %eax
        cmpl    $1879048192, %eax
        je      .L18
(fall-through being the else case)

A single bit test of a flag (g) compiles to two instructions:

        testl   $-1073741824, 8(%eax)
        je      .L9
(fall-through being the if case)

Adding an identity test (comparison with the address of a global),
followed by a bit mask test (h), compiles into six instructions:

        cmpl    $int_type, %eax
        je      .L2
        movl    8(%eax), %eax
        andl    $-268435456, %eax
        cmpl    $1879048192, %eax
        je      .L2
(fall-through being the else case)

In the common case, only two of these instructions are executed.

So all-in-all, I would agree with Guido that adding bit flags is more
efficient. However, existing bits cannot be recycled: in existing
binary extension modules, these flags are set, so if the modules don't
get recompiled, the type check would believe that the types are 
subtypes.

msg51632 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-01-06 14:54
File Added: a.c
msg51633 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2007-02-25 19:50
Committed rev 53911.

Hopefully the checkin comment explains most of what's going on.  I simplified the patch as much as possible.  I like to start with less code.  If we can improve the speed, that can be optimized later.  I didn't measure the little variaions.  I had measured that it made a real diff in speed for using an int subclass a long time ago.  

This should help a fair amount for exceptions too.
History
Date User Action Args
2022-04-11 14:56:21adminsetgithub: 44385
2006-12-29 06:01:06nnorwitzcreate