This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Make itertools.tee participate in GC
Type: Stage:
Components: Extension Modules Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: twouters Nosy List: rhettinger, tim.peters, twouters
Priority: normal Keywords: patch

Created on 2006-03-06 20:33 by twouters, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
itertools.diff twouters, 2006-03-25 16:39 fixed version
Messages (10)
msg49671 - (view) Author: Thomas Wouters (twouters) * (Python committer) Date: 2006-03-06 20:33
A small patch to make itertools.tee objects participate
in GC; solves the memoryleak in test_generators (and
any other use of tee objects in cycles.)
msg49672 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-03-07 00:05
Logged In: YES 
user_id=80475

I will take a look at the patch soon.  IIRC, there was a 
concious decision to not have tee engage in GC because no 
normal use cases created cycles (they seem to arise only 
when intentionally creating a cycle for test code).

msg49673 - (view) Author: Thomas Wouters (twouters) * (Python committer) Date: 2006-03-07 00:12
Logged In: YES 
user_id=34209

Well, I'm fine with removing the scope-enclosed generator
versions of _m235() and fib() from test_generators, if
that's what you're implying. :> However, I think it's
slightly more suitable to just have tee participate in GC.
msg49674 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2006-03-07 00:52
Logged In: YES 
user_id=31435

If there are likely to be millions of some type of object,
then avoiding gc for that type on grounds of reducing memory
use may be arguable (or, as in the cases of tuples or lists,
may not be arguable regardless).  Else "if it points to a
PyObject, it might be in a cycle, so gc-enable it" generally
rules.

I'll note in passing that the m235 and fib generators
weren't _intended_ to stress Python in any way, but have
been extremely effective at doing so since generators first
went in.  Think of them as a pair of canaries in the coal
mine :-)
msg49675 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-03-19 08:02
Logged In: YES 
user_id=80475

Okay, go ahead and add GC to itertools.tee().  The 
test_generators examples make it clear that there are 
valid use cases for feeding a teeobject back into itself.

Tim, the original rationale did not have to do with 
millions of tee objects; rather, it posited that a tee 
could contain many objects of some other type and that 
traversing it during GC was simply a waste of time.

Thomas, the patch needs work, but I don't currently have 
time to go through it.  Do your best to model after the 
other tools which have gc.  That includes setting the 
tp_flags slot, doing an untrack before dealloc starts, 
replacing PyObject_Del and PyObject_New with their GC 
counterparts.  See section 2.1.3 in Extending and 
Embedding for the details and examples.  Offhand, I think 
it may also need a tp_clear entry but I'm not sure.

If you can work out a good patch, go ahead an apply it for 
the alpha 1 release.  If not, assign back to me and I'll 
get to it when I can (possibly for the second alpha).


 
msg49676 - (view) Author: Thomas Wouters (twouters) * (Python committer) Date: 2006-03-20 21:34
Logged In: YES 
user_id=34209

Hmm. An alternative fix would be to make itertools.tee only
partially participate in GC. That is, not have it traverse
over all the stored items, but just the iterator. It would
prevent the (potentially rather heavy) cost of going over
all cached items, but still leaves a potential for
unfindable cycles.
msg49677 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-03-25 16:17
Logged In: YES 
user_id=80475

Georg, do you have time to fix-up this patch and check it 
in?
msg49678 - (view) Author: Thomas Wouters (twouters) * (Python committer) Date: 2006-03-25 16:39
Logged In: YES 
user_id=34209

I cleaned up my patch (attached), but for some reason, it
isn't resolving the refleaks in test_generators. I swear my
patch was resolving them before, but I guess I must've done
something else (maybe in addition to this) that fixed the
leaks. Or maybe I'm just going cross-eyed. I do believe I
implemented GC correctly, and I do believe the leaks are
uncleanable circular references. I haven't had time to debug
it more; if someone else wants to, feel free.
msg49679 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-03-25 17:14
Logged In: YES 
user_id=80475

In teedataobject_new(), try adding PyObject_GC_Track(tdo) 
just before the return statement.
msg49680 - (view) Author: Thomas Wouters (twouters) * (Python committer) Date: 2006-03-27 20:59
Logged In: YES 
user_id=34209

Hmm. Well, adding PyObject_GC_Track() makes GC work right. I
don't know why, since the rest of itertools doesn't use
PyObject_GC_Track, but it now fixes refleaks that involve
just itertools, like:

def leak():
    class gen:
        def __iter__(self):
            return self
        def next(self):
            return self.item
    g = gen()
    head, tail = tee(g)
    g.item = head

This example leaked without tee doing GC (and without
PyObject_GC_Track), but doesn't leak with proper GC.
However! test_generators still leaks, and so does this
example (which doesn't use itertools):

def leak():
    def gen():
        while True:
            yield g
    g = gen()
    g.next()

I suspect this leak was introduced after I submitted this
patch, which explains why this patch suddenly stopped
working ;) Since adding GC still fixes reference-cycles, I
will check in this patch while I hunt for the other leak.
History
Date User Action Args
2022-04-11 14:56:15adminsetgithub: 42988
2006-03-06 20:33:38twouterscreate