Issue1624059
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2006-12-29 06:01 by nnorwitz, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
fast-subclass.diff | nnorwitz, 2006-12-29 06:01 | v1 | ||
a.c | loewis, 2007-01-06 14:54 |
Messages (6) | |||
---|---|---|---|
msg51628 - (view) | Author: Neal Norwitz (nnorwitz) * | Date: 2006-12-29 06:01 | |
This is similar to a patch posted on python-dev a few months ago (or more). I modified it to also handle subclassing exceptions which should speed up exception handling a bit. (This was proposed by Guido based on the original patch.) I also dropped an extra bit that was going to indicate if it was a builtin type or a subclass of a builtin type. |
|||
msg51629 - (view) | Author: Neal Norwitz (nnorwitz) * | Date: 2006-12-29 06:04 | |
I forgot to mention this patch works by using unused bits in tp_flags. This saves a function call when checking for a subclass of a builtin type. There's one funky thing about this patch, the change to Objects/exceptions.c. I didn't investigate why this was necessary, or more likely I did why when I added it and forgot. I know that without adding BASE_EXC_SUBCLASS to tp_flags, test_exceptions fails. |
|||
msg51630 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2007-01-04 03:59 | |
This looks fine, but I have some questions about alternative implementations: - Why does the typical PyFoo_Check() macro first call PyFoo_CheckExact() before calling the fast bit checking macro? Did you measure that this is in fact faster? True, it means always a pointer deref, so maybe it is -- but OTOH it is more instructions. - Why not have a separate bit for each type? Then you could make the fast macro test for (flags & mask) != 0 instead of testing for (flag & mask) == value. It would use up all the remaining bits, but I suspect there are some unused (or reusable) bits in lower positions: 1L<<2 is unused (was GC), and 1L<<11 also seems unused. And bits 18 through 23! And I'm guessing that INPLACEOPS (1L<<3) isn't all that interesting any more they were introduced in 2.0... So it really looks like you have plenty of bits. Of course I don't know if it matters; would be worth it perhaps to look at the machine code. - Oops, it looks like your comment is off. You claim to be using bits 24-27, leaving 28-31 free, but in fact you're using bits 28-31! BTW You're inroducing quite a few lines over 80 chars. Perhaps cut back a bit? |
|||
msg51631 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2007-01-06 14:24 | |
I made a couple of assembler experiments (see attached a.c), with gcc 4.1 on x86. A "bit mask enumeration" test (f) compiles into four instructions: movl 8(%eax), %eax andl $-268435456, %eax cmpl $1879048192, %eax je .L18 (fall-through being the else case) A single bit test of a flag (g) compiles to two instructions: testl $-1073741824, 8(%eax) je .L9 (fall-through being the if case) Adding an identity test (comparison with the address of a global), followed by a bit mask test (h), compiles into six instructions: cmpl $int_type, %eax je .L2 movl 8(%eax), %eax andl $-268435456, %eax cmpl $1879048192, %eax je .L2 (fall-through being the else case) In the common case, only two of these instructions are executed. So all-in-all, I would agree with Guido that adding bit flags is more efficient. However, existing bits cannot be recycled: in existing binary extension modules, these flags are set, so if the modules don't get recompiled, the type check would believe that the types are subtypes. |
|||
msg51632 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2007-01-06 14:54 | |
File Added: a.c |
|||
msg51633 - (view) | Author: Neal Norwitz (nnorwitz) * | Date: 2007-02-25 19:50 | |
Committed rev 53911. Hopefully the checkin comment explains most of what's going on. I simplified the patch as much as possible. I like to start with less code. If we can improve the speed, that can be optimized later. I didn't measure the little variaions. I had measured that it made a real diff in speed for using an int subclass a long time ago. This should help a fair amount for exceptions too. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:21 | admin | set | github: 44385 |
2006-12-29 06:01:06 | nnorwitz | create |