This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Leak in tarfile.py
Type: Stage:
Components: None Versions:
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jensj, sf-robot, tim.peters
Priority: normal Keywords:

Created on 2006-05-31 06:42 by jensj, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (4)
msg28687 - (view) Author: Jens Jørgen Mortensen (jensj) Date: 2006-05-31 06:42
There is a leak when using the tarfile module and the
extractfile method.  Here is a simple example:
 
$ echo "grrr" > x.txt
$ tar cf x.tar x.txt
$ python
Python 2.4.2 (#2, Sep 30 2005, 21:19:01)
[GCC 4.0.2 20050808 (prerelease) (Ubuntu
4.0.1-4ubuntu8)] on linux2
Type "help", "copyright", "credits" or "license" for
more information.
>>> import gc
>>> import tarfile
>>> tar = tarfile.open('x.tar', 'r')
>>> f = tar.extractfile('x.txt')
>>> f.read()
'grrr\n'
>>> del f
>>> gc.set_debug(gc.DEBUG_LEAK)
>>> print gc.collect()
gc: collectable <ExFileObject 0xb73d4acc>
gc: collectable <dict 0xb73dcf0c>
gc: collectable <instancemethod 0xb7d2daf4>
3
>>> print gc.garbage
[<tarfile.ExFileObject object at 0xb73d4acc>, {'name':
'x.txt', 'read': <bound method ExFileObject._readnormal
of <tarfile.ExFileObject object at 0xb73d4acc>>, 'pos':
0L, 'fileobj': <open file 'x.tar', mode 'rb' at
0xb73e67b8>, 'mode': 'r', 'closed': False, 'offset':
512L, 'linebuffer': '', 'size': 5L}, <bound method
ExFileObject._readnormal of <tarfile.ExFileObject
object at 0xb73d4acc>>]
>>>
msg28688 - (view) Author: Jens Jørgen Mortensen (jensj) Date: 2006-06-01 20:08
Logged In: YES 
user_id=716463

Problem is that the ExfileObject hat an attribute
(self.read) that is a method bound to itself
(self._readsparse or self._readnormal).  One solution is to
add "del self.read" to the close method, but someone might
forget to close the object and still get the leak.  Another
solution is to change the end of __init__ to:

  if tarinfo.issparse():
      self.sparse = tarinfo.sparse
  else:
      self.sparse = None

and add a read method:

  def read(self, size=None):
      if self.sparse is None:
          return self._readnormal(size)
      else:
          return self._readsparse(size)
msg28689 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2006-06-01 23:09
Logged In: YES 
user_id=31435

There's no evidence of a leak here -- quite the contrary. 
As the docs say, DEBUG_LEAK implies DEBUG_SAVEALL, and
DEBUG_SAVEALL results in  _all_ cyclic trash getting
appended to gc.garbage.  If you don't mess with
gc.set_debug(), you'll discover that gc.garbage is empty at
the end.

In addition, note that the DEBUG_LEAK output plainly says:

gc: collectable ...

That's also telling you that it found collectable cyclic
trash (which it would have reclaimed had you not forced it
to get appended to gc.garbage instead).  If gc had found
uncollectable cycles, these msgs would have started with

gc: uncollectable ...

instead.

Most directly, if I run your tarfile open() and file
extraction in an infinite loop (without messing with
gc.set_debug()), the process memory use does not grow over time.

Unless you have other evidence of an actual leak, this
report should be closed without action.  Yes, there are
reference cycles here, but they're of kinds cyclic gc reclaims.
msg28690 - (view) Author: SourceForge Robot (sf-robot) Date: 2006-06-19 02:21
Logged In: YES 
user_id=1312539

This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).
History
Date User Action Args
2022-04-11 14:56:17adminsetgithub: 43436
2006-05-31 06:42:30jensjcreate