Issue1055820
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2004-10-28 01:58 by tim.peters, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
temp2a.py | tim.peters, 2004-10-28 01:58 | ZODB 3.4a1 reduced to 60 lines | ||
temp2b.py | tim.peters, 2004-10-28 02:28 | 2 objects and 1 thread | ||
temp2c.py | tim.peters, 2004-10-28 08:06 | This hurts. A lot. | ||
temp2d.py | tim.peters, 2004-10-28 08:56 | __del__ hosage; no weakrefs with callbacks | ||
patch.txt | tim.peters, 2004-10-29 06:54 | First stab at a fix | ||
patch-n1.txt | nascheme, 2004-10-29 17:00 | Another stab | ||
patch2.txt | tim.peters, 2004-10-30 03:23 | Reworked patch.txt; no issues known | ||
patch3.txt | tim.peters, 2004-10-30 07:55 | patch2.txt + comment typo repair | ||
patch3-1.txt | nascheme, 2004-10-30 15:04 | patch3.txt + Neil's fixes | ||
patch4.txt | nascheme, 2004-10-30 15:27 | |||
patch42.txt | tim.peters, 2004-10-30 22:49 | Possibly the end of this -- for this year |
Messages (25) | |||
---|---|---|---|
msg22895 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-28 01:58 | |
Oh oh. It's that time of year again. I'm sure the attached (temp2a.py) can be simplified, perhaps down to two objects and one thread. As is, I *think* it demonstrates that invoking a weakref callback can do fatal damage, not necessarily because of what the callback does, but because simply calling it while gc is running can allow other threads to run during gc too, and resurrect a piece of cyclic trash T that's already been tp_clear()'ed, via invoking a still-living weakref to T. If so, this isn't new in 2.4. It's a real problem since temp2a.py is what's left of ZODB 3.4a1 <wink>. |
|||
msg22896 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-28 02:28 | |
Logged In: YES user_id=31435 Yup, temp2b.py probably shows the same problem, with 2 objects and one thread. This one is definitely more strained, though, since the weakref callback does the damage directly. In temp2a.py, nothing is trying to fool anything, and the only damage done by the wr callbacks there is simply in releasing the GIL (thus allowing other threads to do perfectly ordinary things with weakrefs). |
|||
msg22897 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-28 02:37 | |
Logged In: YES user_id=31435 Noting that temp2b.py fails in current CVS, 2.3.4, and 2.2.3. That's all the Pythons I have handy right now. |
|||
msg22898 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-28 08:06 | |
Logged In: YES user_id=31435 temp2c.py may be as bad as it gets. It shows that the problem can occur on a gc collection that doesn't see *any* object that has a weakref with a callback. |
|||
msg22899 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-28 08:56 | |
Logged In: YES user_id=31435 temp2d.py shows that we don't need any weakrefs with callbacks to get in trouble -- a __del__ method can hose us too. |
|||
msg22900 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-28 20:55 | |
Logged In: YES user_id=31435 A fix is in progress, as sketched on Python-Dev. I expect it to land Friday (or tonight yet, if I'm lucky). |
|||
msg22901 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-29 06:54 | |
Logged In: YES user_id=31435 Neil, can you make some time to eyeball this (patch.txt)? If not, please assign to Fred. The Python -uall suite passes in debug and release builds. I just ran the -all Zope3 test suite in a debug buld, and that passes as well as it does with Python 2.3.4 on Windows, but I see many thousands of lines like: Exception exceptions.KeyError: <weakref at 104EA620; dead> in <function remove at 0x01595560> ignored So there's something still wrong here. I'm not sure who's writing those msgs; I expect it's in the guts of gc, when invoking delayed weakref callbacks, and triggered by the weak dictionary implementations (which haven't changed). |
|||
msg22902 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-29 09:11 | |
Logged In: YES user_id=31435 I strongly suspect that abusing the weakref struct's `hash` member is responsible for the gazillions of KeyErrors. Dang. I need one lousy bit ... |
|||
msg22903 - (view) | Author: Neil Schemenauer (nascheme) * | Date: 2004-10-29 14:56 | |
Logged In: YES user_id=35752 I don't see why you need the extra bit. When you find weakly referenced objects, clear the wr_object pointer of the weakref. Move trash weakref objects with callbacks to the wr_callbacks list as we always did. When the trash goes away then PyObject_ClearWeakRefs() will invoke the callbacks normally (we only cleared the wr_object pointer, the backpointer list is still intact). I'm going to see if I can make this work. In the process I will probably discover what I have been missing. :-) |
|||
msg22904 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-29 15:18 | |
Logged In: YES user_id=31435 Excellent! I think you're right about this. It never occurred to me that just setting wr_object to None would be as effective at disabling a weakref as calling clear_weakref(). If that's really so (& I can't see why not offhand), it would be better in oh-so-many ways. |
|||
msg22905 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-29 16:01 | |
Logged In: YES user_id=31435 Hmm. Just about anything fixes the current rash of bugs. It's keeping the old bugs from coming back that's been the hangup -- don't forget to run test_weakref.py too <wink>. The last batch of bugs was really much subtler than this batch! test_callback_in_cycle_1 is a bitch -- I think we have to stop its callback from ever getting invoked, not just prevent I.wr() from returning J ... |
|||
msg22906 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-29 16:41 | |
Logged In: YES user_id=31435 Bingo. This is a bit delicate <heh>. It's still necessary to call _PyWeakref_ClearRef() on a trash weakref with a callback, to prevent the callback from ever triggering (that was the key to fixing the previous pile of bugs). For all other weakrefs to trash, I think you're right that just setting wr_object to None is conceptually enough. But ... What I pissed away the last half hour discovering is that if you set wr_object to None *before* calling _PyWeakref_ClearRef(), then the latter doesn't do anything, because clear_weakref() doesn't do anything when wr_object is None. So that leaves me a little worried: if we just set wr_object to None on some weakrefs, then PyObject_ClearWeakrefs() will never (and for the same reason) remove such a weakref from its doubly-linked list either. Doesn't look like the weakref code intended that this be possible, and I'm not yet entirely convinced it can't hurt ... |
|||
msg22907 - (view) | Author: Neil Schemenauer (nascheme) * | Date: 2004-10-29 17:00 | |
Logged In: YES user_id=35752 I had to change _PyWeakref_ClearRef() since it was also clearing the weakref list of the trash object. Now it just sets wr_object to Py_None. I also made some serious simplifications to gcmodule by just treating trash weakref objects with callbacks the same as objects with __del__ methods (i.e. move them to the finalizers list and then do the transitive closure of that set). I'm attaching a patch. It's a work in progress. |
|||
msg22908 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-29 17:13 | |
Logged In: YES user_id=31435 > I had to change _PyWeakref_ClearRef() since it was also > clearing the weakref list of the trash object. That was really its *purpose*. If a trash weakref with a callback isn't removed from the referent's list of weakrefs, then the callback will trigger when PyObject_ClearWeakRefs() is invoked on the referent. The purpose of _PyWeakref_ClearRef() was to ensure that the callback never triggers. > Now it just sets wr_object to Py_None. That won't stop the callback from triggering. It also means (see earlier comment) that PyObject_ClearWeakRefs() will never removed the weakref from the list either, although I'm not sure that does real harm. > I also made some serious simplifications to gcmodule by > just treating trash weakref objects with callbacks the same > as objects with __del__ methods (i.e. move them to the > finalizers list and then do the transitive closure of that set). Does that mean they can end up in gc.garbage too? If so, I don't think that's sellable. See the end of gc_weakref.txt: An alternative would have been to treat objects with callbacks like objects with __del__ methods, refusing to collect them, appending them to gc.garbage instead. That would have been much easier. Jim Fulton gave a strong argument against that (on Python-Dev): ... > I'm attaching a patch. It's a work in progress. Aren't we all <wink>. |
|||
msg22909 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-30 03:23 | |
Logged In: YES user_id=31435 patch2.txt stops abusing weakref `hash`, and seems to have no issues w/ Python's or Zope3's -uall/--all tests, in debug or release builds. weakrefs are handled entirely in move_troublemakers() now; collect() doesn't even have its old wr_callbacks list anymore. If there's a more elegant approach coming, great, but if it's not coming soon I'm going to check this in, to get as much developer test coverage as possible before 2.4.b1. |
|||
msg22910 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-30 07:55 | |
Logged In: YES user_id=31435 patch3.txt repairs some typos in comments; semantically, it's the same as patch2.txt. |
|||
msg22911 - (view) | Author: Neil Schemenauer (nascheme) * | Date: 2004-10-30 15:04 | |
Logged In: YES user_id=35752 I think your patch is about as elegant as it can be. The approach looks sound but I believe you forgot to clear the wr_callback pointer. I'm attaching an incremental patch (relative to patch3.txt). Regarding gc_weakref.txt, I think we should add a note at the top explaining the critical rule. I.e. after we can computed the unreachable set (aka trash), it's okay to invoke code that can access non-trash but we cannot allow code to run that can reference trash. There is only one minor issue that I can think of. We decide which callbacks to invoke inside move_troublemakers(). It's possible that unreachable weakrefs become reachable when move_finalizer_reachable() is called (i.e. weakrefs that would be trash except that they got revived because they were referenced by an object with a __del__ method). However, since those weakrefs would have been trash I think it's okay not to invoke their callbacks. |
|||
msg22912 - (view) | Author: Neil Schemenauer (nascheme) * | Date: 2004-10-30 15:27 | |
Logged In: YES user_id=35752 Attaching another patch. This one splits move_troublemakers() into move_finalizers() and handle_weakrefs(). The code is a little easier to understand and arguably more correct. The down side is that it adds another pass over `unreachable`. |
|||
msg22913 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-30 17:20 | |
Logged In: YES user_id=31435 > The approach looks sound but I believe you forgot to clear > the wr_callback pointer. I'm attaching an incremental patch > (relative to patch3.txt). Some of the comments in that incremental patch are good additions, but I don't agree with the code changes. A weakref object owns a (strong) reference to its wr_callback, and the first pass gives move_troublemakers its own new strong reference to each weakref object. Because of the latter, nothing any callback does can cause any of these weakrefs to go away, so nothing can happen to cause their wr_callback slots to become NULL either. The assertion that wr_callback isn't NULL is appropriate. If you still think it's possible for wr_callback to become NULL, please concoct a test showing it happen. The passes after the first give up move_troublemaker's references to the weakref objects. Then they'll go away, when their refcounts hit 0. Sometimes that happens right away, sometimes not. In either case, it's weakref_dealloc's job to decrement the refcount on w->wr_callback when w goes away, and there's no point (that I can see) to duplicating that logic too inside gcmodule. Note that the comments before _PyWeakref_ClearRef() explain that the weakref's tp_dealloc is expected to finish the job (clear the wr_callback slot) -- that's been true since _PyWeakref_ClearRef() was first introduced. |
|||
msg22914 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-30 17:35 | |
Logged In: YES user_id=31435 > Regarding gc_weakref.txt, I think we should add a note at > the top explaining the critical rule. An excellent idea -- I'll do that. |
|||
msg22915 - (view) | Author: Neil Schemenauer (nascheme) * | Date: 2004-10-30 17:51 | |
Logged In: YES user_id=35752 Ah,the detail that I was missing was that wr_callback is not invoked if the weakref list has been cleared. I.e., calling _PyWeakref_ClearRef() is enough ensure that the callback is never implicitly envoked. That's cleaner than explicitly clearing the wr_callback pointer and also seems to a behavior that can be depended on (it's pretty fundamental to how weakrefs are implemented). patch3.txt looks like a winner. |
|||
msg22916 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-30 18:28 | |
Logged In: YES user_id=31435 > Ah,the detail that I was missing was that wr_callback is not > invoked if the weakref list has been cleared. I.e., calling > _PyWeakref_ClearRef() is enough ensure that the callback is > never implicitly envoked Indeed, ensuring that the callback never triggers by magic is a primary *purpose* of _PyWeakref_ClearRef(). I should add something here: this is so delicate in part because I didn't (during the *last* round of bug fixes) want to allocate any heap space during gc -- if malloc couldn't find more heap space when gc needed it, then the only thing I could do is kill the Python process. So the deeper and somewhat ugly truth here is that gc is (re) using the (already-allocated) weakref object as a container *just* to remember its wr_callback pointer. It would have been easier to understand if, instead, gc could have saved away the wr_callback pointers in its own PyList object, and used weakrefobject.c's clear_weakref() instead. Alas, there's no limit on how big such a list may need to become. As things turned out, it's a very good thing I couldn't do that. Fred subsequently made weakrefs subclassable, and now there are actual uses for passing the weakref object to the callback (e.g., weak value dictionaries use a weakref subclass now, and an instance of this subclass has the associated dict key as an attribute), so now a weakref object has to be kept around as long as its callback may still get invoked. |
|||
msg22917 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-30 18:47 | |
Logged In: YES user_id=31435 > There is only one minor issue that I can think of. We > decide which callbacks to invoke inside > move_troublemakers(). It's possible that unreachable > weakrefs become reachable when > move_finalizer_reachable() is called (i.e. weakrefs that > would be trash except that they got revived because they > were referenced by an object with a __del__ method). > > However, since those weakrefs would have been trash I > think it's okay not to invoke their callbacks. I sure agree there are grey areas here! Since I hate __del__ methods, I'm afraid I tend to have a "oh, who gives a shit?" attitude toward them. There are also grey areas here outside of cyclic gc. Take a look at subtype_dealloc: that calls PyObject_ClearWeakRefs () before calling a __del__ method, so invokes weakref callbacks even if the object doesn't actually go away (the __del__ method could still resurrect it). Is that right or wrong? Beats me. I need to think more about this. Offhand I'm inclined to agree that your reworking to crawl over unreachable twice is the more principled approach. I only cringe because it *is* twice, and so probably a little slower. Can you write a test where it makes a visible difference? |
|||
msg22918 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-30 22:49 | |
Logged In: YES user_id=31435 patch42.txt is what I'll check in (and backport to 2.3 maint) provided my remaining tests all pass. This does two passes over `unreachable` as in Neil's patch4.txt, and incorporates various improvements to the comments. |
|||
msg22919 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-10-30 23:13 | |
Logged In: YES user_id=31435 Thanks for the eyeballs and brain cells, Neil! patch42 was checked in: Include/weakrefobject.h 1.6 Lib/test/test_gc.py 1.31 Misc/NEWS 1.1179 Modules/gc_weakref.txt 2.2 Modules/gcmodule.c 2.77 Objects/weakrefobject.c 1.20 I'll backport to 2.3 next, and that's it, so closing this report. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:07 | admin | set | github: 41092 |
2004-10-28 01:58:53 | tim.peters | create |