This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: PyXxx_Check(x) trusts x->ob_type->tp_mro
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: arigo Nosy List: arigo, jepler, mwh
Priority: normal Keywords:

Created on 2005-02-27 21:55 by arigo, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bug.py arigo, 2005-02-27 21:55 Segfaulting example
bug2.py mwh, 2005-03-03 09:59 more nastiness
user-mro-argh.diff mwh, 2005-03-06 18:47 mwh's patch #1
python-subtyping.pdf mwh, 2005-03-15 13:22 mwh's sillyness #1
Messages (13)
msg24386 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-02-27 21:55
The functions PyInt_Check(), PyString_Check(),
PyList_Check() etc. are used all over the core to check
which typecasts are safe, from PyObject* to the various
PyXxxObject*.

But the macros themselves are implemented by
inspecting the "tp_mro" tuple of the incoming object's
type.  As the latter can be completely controlled by the
user, an object can pretend to inherit from anything and
pass the PyXxx_Check() checks of its choice, even if
its memory layout is actually completely wrong.

See attached example.
msg24387 - (view) Author: Jeff Epler (jepler) Date: 2005-03-01 19:15
Logged In: YES 
user_id=2772

Not sure if this is relevant, but the example given didn't
crash 2.2:
$ python2.2 bug.py
Traceback (most recent call last):
  File "bug.py", line 9, in ?
    x = X()
TypeError: dict.__new__(X) is not safe, use object.__new__()
msg24388 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2005-03-01 20:18
Logged In: YES 
user_id=6656

I wonder if Guido owes Armin a beer (read typeobject.c around line 5230).  
Well, not really, but it's that code that is making the difference.

However, reversing the order of dict and object is sufficient to make 2.2 
crash (I presume, it crashes CVS HEAD with the __new__ short-circuiting 
code chopped out which gives the error jelper observes on the original).

I wonder what to do about this.  Removing the ability to customize the mro 
from user code is one obvious approach -- I don't know how much code 
uses this feature though.
msg24389 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-03-02 17:03
Logged In: YES 
user_id=4771

To solve the problem, as hinted in the title of this tracker, I 
think that PyXxx_Check() should simply not trust any mro.  
What PyInt_Check() really means at the C level is to check if 
an object memory layout is compatible with PyIntObject.  This 
is easy to figure out by walking the "solid base" chain, 
tp_base.

As a side note, PyPy gives the same error as Python 2.2.  
However, both in PyPy and in Python 2.2, you can still build 
an instance of the strange class X as follows:

>>> x = object.__new__(X)

Still, all the methods of x are resolved via the dict type.  In 
PyPy we get a clean TypeError because the methods thus 
found don't apply to non-dict objects.  In Python 2.2 you can 
crash the interpreter just as in more recent Python releases, 
e.g. with x[5]=6.
msg24390 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2005-03-02 20:11
Logged In: YES 
user_id=6656

Hmm, yeah, that works.  It wasn't totally clear to me that the user couldn't 
maliciously influence tp_base, but I don't think they can...

A helper function is presumably in order, unless PyType_IsSubtype should 
be changed.
msg24391 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-03-03 09:34
Logged In: YES 
user_id=4771

The PyXxx_Check() macros are #defined with
PyObject_TypeCheck().  Should we change all these #defines to
use a new PyObject_LayoutCheck() or something, and document
to C extension writers that they should also switch to the
new function?  More generally, should we review *all* usages
of PyObject_TypeCheck() and decide if they really meant that
or PyObject_LayoutCheck()?  Do we care about types that are
solid subclasses of 'int' but don't have it in their MRO? 
Should we just forget the whole idea and sanity-check the
user-defined mro, which after all would just add another
strange undocumented condition to typeobject.c? :-)
msg24392 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2005-03-03 09:50
Logged In: YES 
user_id=6656

Certainly some more checking of what the user-defined MRO contains 
would be good -- check the attached, which dumps core at class definition 
time...
msg24393 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2005-03-06 18:47
Logged In: YES 
user_id=6656

I think the attached fixes all this.

What it does: check the return value of PySequence_Tuple (duh).  Check 
that the returned sequence contains only types or old-style classes.  
Checks that the solid_base of each type in the returned sequence is a 
supertype of the solid_base of type being built.  Adds a few tests.

The error messages are a bit inscrutable.  I find it hard to care.

Assigning to Armin for the first look.  Might want to get Guido to look at it 
too.

When Armin and I talked about this on IRC we found a few oddities in the 
general area, but I don't know if it's worth opening a new bug report for 
them...
msg24394 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-03-14 11:25
Logged In: YES 
user_id=4771

I tried to convince myself that this check always accepts the 
default mro, but there are more implicit assumptions around 
than I can handle in a quick review...

Instead, I modified the patch so that a debug build of Python 
always does the check in mro_internal().  This results in a 
shallow problem: the basestring type has tp_basicsize==0, 
which makes the computation of its solid_base trigger an 
assertion in extra_ivars().  If I hack around this problem, then 
I cannot quickly find an example of built-in mro that triggers a 
TypeError, but it doesn't mean there isn't one...
msg24395 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2005-03-14 11:41
Logged In: YES 
user_id=6656

Well, emprically that can't happen because the patch passes test_descr, 
and far sillier things happen in that file than in real life.

More theoretically... I'll think about it :)
msg24396 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2005-03-15 13:22
Logged In: YES 
user_id=6656

Please see the attached PDF for a proof of ... well, something in the area.

I think this is what typeobject.c should be doing and what it's *trying* to 
do.  I'm less confident that this is what it's actually doing (particularly wrt 
the subclassing two subclasses of str thing).

msg24397 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-12-14 14:41
Logged In: YES 
user_id=4771

I'm getting confused.

First note that the problem I mentioned about extra_ivars() triggering an assert() can only occur at start-up.  Indeed, PyType_Ready() copies the tp_base->tp_basicsize if the subtype's tp_basicsize is 0.  So we can probably ignore the problem -- no extra_ivars() call would trigger the assert after start-up.

The check that your patch adds is that solid_base(A) must be a superclass of solid_base(T) for each A in the mro of T.  For built-in mros, the A's are exactly the superclasses of T.  So we have to prove [*]: if A is a superclass of T, then solid_base(A) is a superclass of solid_base(T).

Irrespective of what extra_ivars() does, solid_base(T) is either exactly T, or solid_base(T->tp_base).  Moreover, when a type T is created with immediate bases B1...Bn, we check that all solid_bases of B1...Bn form a linear order (i.e. are subclasses of each other), and we set T->tp_base to the first Bi such that solid_base(Bi) is a subclass of all the other solid_base(Bj).

Let's approach [*] above by induction on the set of types in which A and T are chosen.  This set of types can be ordered by creation time: T1, T2, T3, ... where classes with a higher index can only inherit from classes with lower indices.  Assume that [*] holds if A and T are chosen among T1...Tn-1.  Let's prove it if A and/or T can also be Tn.  The only non-trivial case is if A is a strict superclass of T=Tn.

As above, let write B1...Bn for the immediate bases of T, with T->tp_base=Bi.

Pick j such that A is a superclass of Bj.  By induction solid_base(A) is a superclass of solid_base(Bj) which is a superclass of solid_base(Bi=T->tp_base).  Now solid_base(T) is either T or solid_base(Bi).  In both cases, solid_base(T) is a subclass of solid_base(A).  This concludes the proof -- or I am missing something :-)

In other words, I don't think the patch can break anything, so it should definitely go in.
msg24398 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-12-29 17:08
Logged In: YES 
user_id=4771

Checked in as r41845.
History
Date User Action Args
2022-04-11 14:56:09adminsetgithub: 41630
2005-02-27 21:55:55arigocreate