This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: BZ2File leaking fd and memory
Type: Stage:
Components: Extension Modules Versions: Python 2.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: niemeyer Nosy List: gvanrossum, loewis, niemeyer, nnorwitz
Priority: normal Keywords: patch

Created on 2003-01-03 19:22 by nnorwitz, last changed 2022-04-10 16:06 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bz.diff nnorwitz, 2003-01-05 20:09 patch 2, this should be correct now
bz2test.diff nnorwitz, 2003-01-10 23:55 update test for the fd leak
Messages (15)
msg42252 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2003-01-03 19:22
The attached patch fixes BZ2File() leaking the fd and
PyFileObject* info (name, read ahead buffer) when an
object is deleted.

I'm not sure the patch fixes these problems in the best
way.  It exposes most of the implementation from
fileobject.c::file_dealloc() through a private API
_PyFile_Dealloc().
BZ2File derives from PyFileObject.

Make sure the file: 'empty' exists and do: 

  from bz2 import *

A simple test which demonstrates the problem is:

  for i in range(100000): o = BZ2File('empty') ; del o

Without the patch, you quickly get:

IOError: [Errno 24] Too many open files: 'empty'

You can modify this to:

  for i in range(100000): o = BZ2File('empty') ;
o.close() ; del o

Now you can see the memory leaks.
msg42253 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-01-03 19:48
Logged In: YES 
user_id=21627

Assigning to the author of the module for review. Gustavo,
if you can't find the time for a review, feel free to
unassign it.
msg42254 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2003-01-05 20:09
Logged In: YES 
user_id=33168

I've found a better solution: change the last line of
BZ2File_dealloc() from: 
  ((PyObject*)self)->ob_type->tp_free((PyObject *)self);

to
  PyFile_Type.tp_dealloc((PyObject *)self);
Updated patch attached.
msg42255 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2003-01-05 21:11
Logged In: YES 
user_id=6380

Um, I don't understand why the BZ2File class inherits from
FileObject. It doesn't seem to be inheriting any code, and
there's no reason to inherit from FileObject just to make a
file-like object. Maybe it was a misunderstanding?
msg42256 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2003-01-20 20:44
Logged In: YES 
user_id=7887

Sorry about the delay. I was on a not-so-short vacation.

nnorwitz:

Thanks for the patch. I'll check the problem (for my own
understanding) and most certainly apply it.

gvanrossum:

BZ2File used to share a lot of code with FileObject.
Unfortunately (for the BZ2File side), FileObject changed
considerably post 2.2, and some of the code had to be moved
to BZ2File itself. It still
uses some code from FileObject, though (open, close, seek,
part of the softspace handling, access to attributes and
some common methods).
msg42257 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2003-01-21 19:24
Logged In: YES 
user_id=6380

FileObject sharing: I see.  But I still think it's a bad
idea, and you're better off using stdio methods directly (as
the evolution of the FileObject implementation has already
shown).
msg42258 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2003-01-22 15:39
Logged In: YES 
user_id=7887

Ok. Do you want me to work on this for 2.3?
msg42259 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2003-01-22 17:29
Logged In: YES 
user_id=6380

If you could, that would be great.
msg42260 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2003-02-02 04:21
Logged In: YES 
user_id=7887

Ok, I've started to "unparent" the bz2 code. OTOH, I've just
realised what a lot of code, full of #ifdef's, I'll have to
copy, mostly unchanged. The following functions

file_new(), file_init(), open_the_file(),
fill_file_fields(), dircheck(), close_file(),
get_newlines(), _portable_fseek(), file_seek(), file_repr(),

and the following arrays

file_memberlist[], file_getsetlist[]

My question is, do you think it's better to maintain all
this code duplicated than maintaining bz2 up to date?

I'll do the job, if you still think this is great.
msg42261 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2003-02-02 12:57
Logged In: YES 
user_id=6380

I don't understand. Why would you want to copy all the
implementation details of those functions? Have a look at
how the zlib module emulates a file.
msg42262 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2003-02-03 12:07
Logged In: YES 
user_id=7887

zlib module doesn't  emulate a file at C level. It also does
not implement as many features as the BZ2File does
(universal newlines, buffered IO, etc).

The strange thing is, I thought it was a good thing to
develop inheriting FileObject that way, since it would save
code and maintenance, while being fast. I also thought it
was a good model to make BZ2File behave as a subclass (I
even mentioned that in the documentation). I'm surprised for
being wrong.
msg42263 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2003-02-03 14:51
Logged In: YES 
user_id=6380

IMO, being a subclass of a concrete class typically only
makes sense if you can maintain the meaning of the instance
variables defined by the base class.  In this case I don't
believe that's the case. If looks like you are using
subclassing from the file class where you should really be
using an instance variable that points to a regular file
object. For example, operations on the file descriptor
returned by fileno() have a very different effect for a
BZ2file than for a regular file (since they manipulate the
compressed file rather than the uncompressed byte stream
represented by the BZ2file).

You may also confuse other parts of the Python runtime that
assume that when they see a file instance they can extract
the underlying stdio FILE* from it and manipulate that.
msg42264 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2003-02-03 15:52
Logged In: YES 
user_id=7887

Ok.. I can't find a good way to workaround the presented
reasons. I'll work to unparent BZ2File. 
msg42265 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2003-02-03 16:09
Logged In: YES 
user_id=6380

Thanks! I expect you'll be surprised at the gained clarity
of the resulting code.
msg42266 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2003-02-11 18:55
Logged In: YES 
user_id=7887

Fixed in the following revisions:

Modules/bz2module.c: 1.15
Doc/lib/libbz2.tex: 1.6

Thanks to everybody.
History
Date User Action Args
2022-04-10 16:06:06adminsetgithub: 37709
2003-01-03 19:22:48nnorwitzcreate