This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Segfault provoked by generators and exceptions
Type: Stage:
Components: Interpreter Core Versions: Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: awaters, eric_noyau, klaas, loewis, mwh, nnorwitz, tim.peters
Priority: critical Keywords:

Created on 2006-10-18 02:23 by klaas, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
gen.diff loewis, 2006-10-18 21:05
hope.py tim.peters, 2006-10-19 00:38 quick-failing (in Windows debug build)
tstate.diff klaas, 2006-11-27 18:42 Quick & dirty fix
tr.diff loewis, 2007-01-22 07:51 eliminate usage of f_tstate in PyTraceBack_Here
Messages (19)
msg30272 - (view) Author: Mike Klaas (klaas) Date: 2006-10-18 02:23
A reproducible segfault when using heavily-nested
generators and exceptions.

Unfortunately, I haven't yet been able to provoke this
behaviour with a standalone python2.5 script.  There
are, however, no third-party c extensions running in
the process so I'm fairly confident that it is a
problem in the core.

The gist of the code is a series of nested generators
which leave scope when an exception is raised.  This
exception is caught and re-raised in an outer loop. 
The old exception was holding on to the frame which was
keeping the generators alive, and the sequence of
generator destruction and new finalization caused the
segfault.   
msg30273 - (view) Author: Mike Klaas (klaas) Date: 2006-10-18 02:23
Logged In: YES 
user_id=1611720

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208400192 (LWP 26235)]
0x080e4296 in PyTraceBack_Here (frame=0x9c2d7b4) at
Python/traceback.c:94
94              if ((next != NULL &&
!PyTraceBack_Check(next)) ||
(gdb) bt
#0  0x080e4296 in PyTraceBack_Here (frame=0x9c2d7b4) at
Python/traceback.c:94
#1  0x080b9ab7 in PyEval_EvalFrameEx (f=0x9c2d7b4,
throwflag=1) at Python/ceval.c:2459
#2  0x08101a40 in gen_send_ex (gen=0xb64f880c,
arg=0x81333e0, exc=1) at Objects/genobject.c:82
#3  0x08101c0f in gen_close (gen=0xb64f880c, args=0x0) at
Objects/genobject.c:128
#4  0x08101cde in gen_del (self=0xb64f880c) at
Objects/genobject.c:163
#5  0x0810195b in gen_dealloc (gen=0xb64f880c) at
Objects/genobject.c:31
#6  0x080b9912 in PyEval_EvalFrameEx (f=0x9c2802c,
throwflag=1) at Python/ceval.c:2491
#7  0x08101a40 in gen_send_ex (gen=0xb64f362c,
arg=0x81333e0, exc=1) at Objects/genobject.c:82
#8  0x08101c0f in gen_close (gen=0xb64f362c, args=0x0) at
Objects/genobject.c:128
#9  0x08101cde in gen_del (self=0xb64f362c) at
Objects/genobject.c:163
#10 0x0810195b in gen_dealloc (gen=0xb64f362c) at
Objects/genobject.c:31
#11 0x080815b9 in dict_dealloc (mp=0xb64f4a44) at
Objects/dictobject.c:801
#12 0x080927b2 in subtype_dealloc (self=0xb64f340c) at
Objects/typeobject.c:686
#13 0x0806028d in instancemethod_dealloc (im=0xb796a0cc) at
Objects/classobject.c:2285
#14 0x080815b9 in dict_dealloc (mp=0xb64f78ac) at
Objects/dictobject.c:801
#15 0x080927b2 in subtype_dealloc (self=0xb64f810c) at
Objects/typeobject.c:686
#16 0x081028c5 in frame_dealloc (f=0x9c272bc) at
Objects/frameobject.c:416
#17 0x080e41b1 in tb_dealloc (tb=0xb799166c) at
Python/traceback.c:34
#18 0x080e41c2 in tb_dealloc (tb=0xb4071284) at
Python/traceback.c:33
#19 0x080e41c2 in tb_dealloc (tb=0xb7991824) at
Python/traceback.c:33
#20 0x08080dca in insertdict (mp=0xb7f56824, key=0xb3fb9930,
hash=1492466088, value=0xb3fb9914)
    at Objects/dictobject.c:394
#21 0x080811a4 in PyDict_SetItem (op=0xb7f56824,
key=0xb3fb9930, value=0xb3fb9914) at Objects/dictobject.c:619
#22 0x08082dc6 in PyDict_SetItemString (v=0xb7f56824,
key=0x8129284 "exc_traceback", item=0xb3fb9914)
    at Objects/dictobject.c:2103
#23 0x080e2837 in PySys_SetObject (name=0x8129284
"exc_traceback", v=0xb3fb9914) at Python/sysmodule.c:82
#24 0x080bc9e5 in PyEval_EvalFrameEx (f=0x9c10e7c,
throwflag=0) at Python/ceval.c:2954
#25 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7bbc890,
globals=0xb7bbe57c, locals=0x0, args=0x9b8e2ac, argcount=1,
    kws=0x9b8e2b0, kwcount=0, defs=0xb7b7aed8, defcount=1,
closure=0x0) at Python/ceval.c:2833
#26 0x080bd62a in PyEval_EvalFrameEx (f=0x9b8e16c,
throwflag=0) at Python/ceval.c:3662
#27 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7bbc848,
globals=0xb7bbe57c, locals=0x0, args=0xb7af9d58, argcount=1,
    kws=0x9b7a818, kwcount=0, defs=0x0, defcount=0,
closure=0x0) at Python/ceval.c:2833
#28 0x08104083 in function_call (func=0xb7b79c34,
arg=0xb7af9d4c, kw=0xb7962c64) at Objects/funcobject.c:517
#29 0x0805a660 in PyObject_Call (func=0xb7b79c34,
arg=0xb7af9d4c, kw=0xb7962c64) at Objects/abstract.c:1860
#30 0x080bcb4b in PyEval_EvalFrameEx (f=0x9b82c0c,
throwflag=0) at Python/ceval.c:3846
#31 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7cd6608,
globals=0xb7cd4934, locals=0x0, args=0x9b7765c, argcount=2,
    kws=0x9b77664, kwcount=0, defs=0x0, defcount=0,
closure=0xb7cfe874) at Python/ceval.c:2833
#32 0x080bd62a in PyEval_EvalFrameEx (f=0x9b7751c,
throwflag=0) at Python/ceval.c:3662
#33 0x080bdf70 in PyEval_EvalFrameEx (f=0x9a9646c,
throwflag=0) at Python/ceval.c:3652
#34 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7f39728,
globals=0xb7f6ca44, locals=0x0, args=0x9b7a00c, argcount=0,
    kws=0x9b7a00c, kwcount=0, defs=0x0, defcount=0,
closure=0xb796410c) at Python/ceval.c:2833
#35 0x080bd62a in PyEval_EvalFrameEx (f=0x9b79ebc,
throwflag=0) at Python/ceval.c:3662
#36 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7f39770,
globals=0xb7f6ca44, locals=0x0, args=0x99086c0, argcount=0,
    kws=0x99086c0, kwcount=0, defs=0x0, defcount=0,
closure=0x0) at Python/ceval.c:2833
#37 0x080bd62a in PyEval_EvalFrameEx (f=0x9908584,
throwflag=0) at Python/ceval.c:3662
#38 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7f397b8,
globals=0xb7f6ca44, locals=0xb7f6ca44, args=0x0, argcount=0,
    kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
at Python/ceval.c:2833
---Type <return> to continue, or q <return> to quit---
#39 0x080bff32 in PyEval_EvalCode (co=0xb7f397b8,
globals=0xb7f6ca44, locals=0xb7f6ca44) at Python/ceval.c:494
#40 0x080ddff1 in PyRun_FileExFlags (fp=0x98a4008,
filename=0xbfffd4a3 "scoreserver.py", start=257,
    globals=0xb7f6ca44, locals=0xb7f6ca44, closeit=1,
flags=0xbfffd298) at Python/pythonrun.c:1264
#41 0x080de321 in PyRun_SimpleFileExFlags (fp=Variable "fp"
is not available.
) at Python/pythonrun.c:870
#42 0x08056ac4 in Py_Main (argc=1, argv=0xbfffd334) at
Modules/main.c:496
#43 0x00a69d5f in __libc_start_main () from /lib/libc.so.6
#44 0x08056051 in _start ()

msg30274 - (view) Author: Mike Klaas (klaas) Date: 2006-10-18 19:37
Logged In: YES 
user_id=1611720

I've produced a simplified traceback with a single generator
.  Note the frame being used in the traceback (#0) is the
same frame being dealloc'd (#11).

The relevant call in traceback.c is:
PyTraceBack_Here(PyFrameObject *frame)
{
        PyThreadState *tstate = frame->f_tstate;
        PyTracebackObject *oldtb = (PyTracebackObject *)
tstate->curexc_traceback;
        PyTracebackObject *tb = newtracebackobject(oldtb,
frame);

and I can verify that oldtb contains garbage:
(gdb) print frame
$1 = (PyFrameObject *) 0x8964d94
(gdb) print frame->f_tstate
$2 = (PyThreadState *) 0x895b178
(gdb) print $2->curexc_traceback
$3 = (PyObject *) 0x66



#0  0x080e4296 in PyTraceBack_Here (frame=0x8964d94) at
Python/traceback.c:94
#1  0x080b9ab7 in PyEval_EvalFrameEx (f=0x8964d94,
throwflag=1) at Python/ceval.c:2459
#2  0x08101a40 in gen_send_ex (gen=0xb7cca4ac,
arg=0x81333e0, exc=1) at Objects/genobject.c:82
#3  0x08101c0f in gen_close (gen=0xb7cca4ac, args=0x0) at
Objects/genobject.c:128
#4  0x08101cde in gen_del (self=0xb7cca4ac) at
Objects/genobject.c:163
#5  0x0810195b in gen_dealloc (gen=0xb7cca4ac) at
Objects/genobject.c:31
#6  0x080815b9 in dict_dealloc (mp=0xb7cc913c) at
Objects/dictobject.c:801
#7  0x080927b2 in subtype_dealloc (self=0xb7cca76c) at
Objects/typeobject.c:686
#8  0x0806028d in instancemethod_dealloc (im=0xb7d07f04) at
Objects/classobject.c:2285
#9  0x080815b9 in dict_dealloc (mp=0xb7cc90b4) at
Objects/dictobject.c:801
#10 0x080927b2 in subtype_dealloc (self=0xb7cca86c) at
Objects/typeobject.c:686
#11 0x081028c5 in frame_dealloc (f=0x8964a94) at
Objects/frameobject.c:416
#12 0x080e41b1 in tb_dealloc (tb=0xb7cc1fcc) at
Python/traceback.c:34
#13 0x080e41c2 in tb_dealloc (tb=0xb7cc1f7c) at
Python/traceback.c:33
#14 0x08080dca in insertdict (mp=0xb7f99824, key=0xb7ccd020,
hash=1492466088, value=0xb7ccd054)
    at Objects/dictobject.c:394
#15 0x080811a4 in PyDict_SetItem (op=0xb7f99824,
key=0xb7ccd020, value=0xb7ccd054)
    at Objects/dictobject.c:619
#16 0x08082dc6 in PyDict_SetItemString (v=0xb7f99824,
key=0x8129284 "exc_traceback", 
    item=0xb7ccd054) at Objects/dictobject.c:2103
#17 0x080e2837 in PySys_SetObject (name=0x8129284
"exc_traceback", v=0xb7ccd054)
    at Python/sysmodule.c:82
#18 0x080bc9e5 in PyEval_EvalFrameEx (f=0x895f934,
throwflag=0) at Python/ceval.c:2954
---Type <return> to continue, or q <return> to quit---
#19 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7f6ade8,
globals=0xb7fafa44, locals=0x0, 
    args=0xb7cc5ff8, argcount=1, kws=0x0, kwcount=0,
defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:2833
#20 0x08104083 in function_call (func=0xb7cc7294,
arg=0xb7cc5fec, kw=0x0)
    at Objects/funcobject.c:517
#21 0x0805a660 in PyObject_Call (func=0xb7cc7294,
arg=0xb7cc5fec, kw=0x0)
    at Objects/abstract.c:1860
msg30275 - (view) Author: Mike Klaas (klaas) Date: 2006-10-18 19:47
Logged In: YES 
user_id=1611720

I cannot yet produce an only-python script which reproduces
the problem, but I can give an overview.  There is a
generator running in one thread, an exception being raised
in another thread, and as a consequent, the generator in the
first thread is garbage-collected (triggering an exception
due to the new generator cleanup).  The problem is extremely
sensitive to timing--often the insertion/removal of print
statements, or reordering the code, causes the problem to
vanish, which is confounding my ability to create a simple
test script.

def getdocs():
    def f():
        <some somehwat time-consuming operation>
    while True:
        f()
        yield None

#
-----------------------------------------------------------------------------

class B(object):
    def __init__(self,):
        pass
    def doit(self):
        # must be an instance var to trigger segfault
        self.docIter = getdocs()
        print self.docIter # this is the generator
referred-to in the traceback
        for i, item in enumerate(self.docIter):            
            if i > 9:
                break            
        print 'exiting generator'


class A(object):
    """ Process entry point / main thread """
    def __init__(self):
  
        while True:
            try:
                self.func()
            except Exception, e:
                print 'right after raise'

  
    def func(self):        
        b = B()
        thread = threading.Thread(target=b.doit)
        thread.start()
        start_t = time.time()
        while True:
            try:
                if time.time() - start_t > 1:
                    raise Exception
            except Exception:
                print 'right before raise'
                # SIGSEGV here.  If this is changed to
                # 'break', no segfault occurs
                raise


if __name__ == '__main__':
    A()
msg30276 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-10-18 21:05
Logged In: YES 
user_id=21627

Can you please review/try attached patch? Can anybody tell
why gi_frame *isn't* incref'ed when the generator is created?
msg30277 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2006-10-18 21:57
Logged In: YES 
user_id=31435

> Can anybody tell why gi_frame *isn't* incref'ed when
> the generator is created?

As documented (in concrete.tex), PyGen_New(f) steals a
reference to the frame passed to it.  Its only call site
(well, in the core) is in ceval.c, which returns immediately
after PyGen_New takes over ownership of the frame the caller
created:

"""
/* Create a new generator that owns the ready to run frame
 * and return that as the value. */
return PyGen_New(f);
"""

In short, that PyGen_New() doesn't incref the frame passed
to it is intentional.

It's possible that the intent is flawed ;-), but offhand I
don't see how.
msg30278 - (view) Author: Mike Klaas (klaas) Date: 2006-10-19 00:12
Logged In: YES 
user_id=1611720

Despite Tim's reassurrance, I'm afraid that Martin's patch
does infact prevent the segfault.  Sounds like it also
introduces a memleak.
msg30279 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2006-10-19 00:38
Logged In: YES 
user_id=31435

I've attached a much simplified pure-Python script (hope.py)
that reproduces a problem very quickly, on Windows, in a
/debug/ build of current trunk.  It typically prints:

exiting generator
joined thread

at most twice before crapping out.  At the time, the `next`
argument to newtracebackobject() is 0xdddddddd, and tracing
back a level shows that, in PyTraceBack_Here(),
frame->tstate is entirely filled with 0xdd bytes.

Note that this is not a debug-build obmalloc gimmick!  This
is Microsoft's similar debug-build gimmick for their malloc,
and for some reason Python uses the system malloc directly
to obtain memory for thread states.  The Microsoft debug
free() fills newly-freed memory with 0xdd, which has the
same meaning as the debug-build obmalloc's DEADBYTE (0xdb).

So somebody is accessing a thread state here after it's been
freed.  Best guess is that the generator is getting "cleaned
up" after the thread that created it has gone away, so the
generator's frame's f_tstate is trash.

Note that a PyThreadState (a frame's f_tstate) is /not/ a
Python object -- it's just a raw C struct, and its lifetime
isn't controlled by refcounts.
msg30280 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2006-10-19 07:58
Logged In: YES 
user_id=6656

> and for some reason Python uses the system malloc directly
> to obtain memory for thread states.

This bit is fairly easy: they are allocated without the GIL being held, which 
breaks an assumption of PyMalloc.

No idea about the real problem, sadly.
msg30281 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-10-28 04:56
Logged In: YES 
user_id=33168

Mike, what platform are you having the problem on?

I tried Tim's hope.py on Linux x86_64 and Mac OS X 10.4 with
debug builds and neither one crashed.  Tim's guess looks
pretty damn good too.  Here's the result of valgrind:

Invalid read of size 8                                     
                    
   at 0x4CEBFE: PyTraceBack_Here (traceback.c:117)         
                    
   by 0x49C1F1: PyEval_EvalFrameEx (ceval.c:2515)          
                    
   by 0x4F615D: gen_send_ex (genobject.c:82)               
                    
   by 0x4F6326: gen_close (genobject.c:128)                
                    
   by 0x4F645E: gen_del (genobject.c:163)                  
                    
   by 0x4F5F00: gen_dealloc (genobject.c:31)               
                    
   by 0x44D207: _Py_Dealloc (object.c:1928)                
                    
   by 0x44534E: dict_dealloc (dictobject.c:801)            
                    
   by 0x44D207: _Py_Dealloc (object.c:1928)                
                    
   by 0x4664FF: subtype_dealloc (typeobject.c:686)         
                    
   by 0x44D207: _Py_Dealloc (object.c:1928)                
                    
   by 0x42325D: instancemethod_dealloc (classobject.c:2287)
                    
 Address 0x56550C0 is 88 bytes inside a block of size 152
free'd                
   at 0x4A1A828: free (vg_replace_malloc.c:233)            
                    
   by 0x4C3899: tstate_delete_common (pystate.c:256)       
                    
   by 0x4C3926: PyThreadState_DeleteCurrent (pystate.c:282)
                    
   by 0x4D4043: t_bootstrap (threadmodule.c:448)           
                    
   by 0x4B24C48: pthread_start_thread (in
/lib/libpthread-0.10.so)              

The only way I can think to fix this is to keep a set of
active generators in the PyThreadState and calling
gen_send_ex(exc=1) for all the active generators before
killing the tstate in t_bootstrap.
msg30282 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2006-10-28 05:18
Logged In: YES 
user_id=31435

> I tried Tim's hope.py on Linux x86_64 and
> Mac OS X 10.4 with debug builds and neither
> one crashed.  Tim's guess looks pretty damn
> good too.

Neal, note that it's the /Windows/ malloc that fills freed
memory with "dangerous bytes" in a debug build -- this
really has nothing to do with that it's a debug build of
/Python/ apart from that on Windows a debug build of Python
also links in the debug version of Microsoft's malloc.

The valgrind report is pointing at the same thing.  Whether
this leads to a crash is purely an accident of when and how
the system malloc happens to reuse the freed memory.
msg30283 - (view) Author: Eric Noyau (eric_noyau) Date: 2006-11-27 18:07
We are experiencing the same segfault in our application, reliably. Running our unit test suite just segfault everytime on both Linux and Mac OS X. Applying Martin's patch fixes the segfault, and makes everything fine and dandy, at the cost of some memory leaks if I understand properly.

This particular bug prevents us to upgrade to python 2.5 in production.
msg30284 - (view) Author: Mike Klaas (klaas) Date: 2006-11-27 18:41
The following patch resets the thread state of the generator when it is resumed, which prevents the segfault for me:

Index: Objects/genobject.c
===================================================================
--- Objects/genobject.c (revision 52849)
+++ Objects/genobject.c (working copy)
@@ -77,6 +77,7 @@
        Py_XINCREF(tstate->frame);
        assert(f->f_back == NULL);
        f->f_back = tstate->frame;
+        f->f_tstate = tstate;
 
        gen->gi_running = 1;
        result = PyEval_EvalFrameEx(f, exc);
msg30285 - (view) Author: Andrew Waters (awaters) Date: 2007-01-04 09:35
This fixes the segfault problem that I was able to reliably reproduce on Linux.

We need to get this applied (assuming it is the correct fix) to the source to make Python 2.5 usable for me in production code.
msg30286 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-01-04 10:42
Why do frame objects have a thread state in the first place? In particular, why does PyTraceBack_Here get the thread state from the frame, instead of using the current thread?

Introduction of f_tstate goes back to r7882, but it is not clear why it was done that way.
msg30287 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2007-01-17 07:01
Bumping priority to see if this should go into 2.5.1.
msg30288 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-01-22 07:51
I don't like mklaas' patch, since I think it is conceptually wrong to have PyTraceBack_Here() use the frame's thread state (mklaas describes it as dirty, and I agree). I'm proposing an alternative patch (tr.diff); please test this as well.
File Added: tr.diff
msg30289 - (view) Author: Andrew Waters (awaters) Date: 2007-01-22 08:46
A quick test on code that always segfaulted with unpatched Python 2.5 seems to work.
Needs more extensive testing...
msg30290 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-01-23 21:13
This is now fixed in r53531 and r53532. For the trunk, it is likely that f_tstate will get eliminated altogether in the near future. People who had the problem are really encouraged to test 2.5.1c1 when it is released.
History
Date User Action Args
2022-04-11 14:56:20adminsetgithub: 44139
2006-10-18 02:23:29klaascreate