Issue1208304
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2005-05-25 09:20 by manekcz, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
urllib2leak.py | stephbul, 2009-06-03 13:31 | main test | ||
urllib2.py | peci, 2009-09-04 10:17 |
Messages (21) | |||
---|---|---|---|
msg60743 - (view) | Author: Petr Toman (manekcz) | Date: 2005-05-25 09:20 | |
It seems that the urlopen(url) methd of the urllib2 module leaves some undestroyable objects in memory. Please try the following code: ========================== if __name__ == '__main__': import urllib2 a = urllib2.urlopen('http://www.google.com') del a # or a = None or del(a) # check memory on memory leaks import gc gc.set_debug(gc.DEBUG_SAVEALL) gc.collect() for it in gc.garbage: print it ========================== In our code, we're using lots of urlopens in a loop and the number of unreachable objects grows beyond all limits :) We also tried a.close() but it didn't help. You can also try the following: ========================== def print_unreachable_len(): # check memory on memory leaks import gc gc.set_debug(gc.DEBUG_SAVEALL) gc.collect() unreachableL = [] for it in gc.garbage: unreachableL.append(it) return len(str(unreachableL)) if __name__ == '__main__': print "at the beginning", print_unreachable_len() import urllib2 print "after import of urllib2", print_unreachable_len() a = urllib2.urlopen('http://www.google.com') print 'after urllib2.urlopen', print_unreachable_len() del a print 'after del', print_unreachable_len() ========================== We're using WindowsXP with latest patches, Python 2.4 (ActivePython 2.4 Build 243 (ActiveState Corp.) based on Python 2.4 (#60, Nov 30 2004, 09:34:21) [MSC v.1310 32 bit (Intel)] on win32). |
|||
msg60744 - (view) | Author: A.M. Kuchling (akuchling) * | Date: 2005-06-01 23:13 | |
Logged In: YES user_id=11375 Confirmed. The objects involved seem to be an HTTPResponse and the socket._fileobject wrapper; the assignment 'r.recv=r.read' around line 1013 of urllib2.py seems to be critical to creating the cycle. |
|||
msg60745 - (view) | Author: Sean Reifschneider (jafo) * | Date: 2005-06-29 03:27 | |
Logged In: YES user_id=81797 I can reproduce this in both the python.org 2.4 RPM and in a freshly built copy from CVS. If I run a few thousand urlopen()s, I get: Traceback (most recent call last): File "/tmp/mt", line 26, in ? File "/tmp/python/dist/src/Lib/urllib2.py", line 130, in urlopen File "/tmp/python/dist/src/Lib/urllib2.py", line 361, in open File "/tmp/python/dist/src/Lib/urllib2.py", line 379, in _open File "/tmp/python/dist/src/Lib/urllib2.py", line 340, in _call_chain File "/tmp/python/dist/src/Lib/urllib2.py", line 1026, in http_open File "/tmp/python/dist/src/Lib/urllib2.py", line 1001, in do_open urllib2.URLError: <urlopen error (24, 'Too many open files')> Even if I do a a.close(). I'll investigate a bit further. Sean |
|||
msg60746 - (view) | Author: Sean Reifschneider (jafo) * | Date: 2005-06-29 03:52 | |
Logged In: YES user_id=81797 I give up, this code is kind of a maze of twisty little passages. I did try doing "a.fp.close()" and that didn't seem to help at all. Couldn't really make any progress on that though. I also tried doing a "if a.headers.fp: a.headers.fp.close()", which didn't do anything noticable. |
|||
msg60747 - (view) | Author: Brian Wellington (bwelling) | Date: 2005-08-12 02:22 | |
Logged In: YES user_id=63197 We just ran into this same problem, and worked around it by simply removing the 'r.recv = r.read' line in urllib2.py, and creating a recv alias to the read function in HTTPResponse ('recv = read' in the class). Not sure if this is the best solution, but it seems to work. |
|||
msg60748 - (view) | Author: Sean Reifschneider (jafo) * | Date: 2005-08-12 22:30 | |
Logged In: YES user_id=81797 I've just tried it again using the current CVS version as well as the version installed with Fedora Core 4, and in both cases I was able to run over 100,000 retrievals of http://127.0.0.1/test.html and http://127.0.0.1/google.html. test.html is just "it works" and google.html was generated with "wget -O google.html http://google.com/". I was able to reproduce this before, but now am not. My urllib2.py includes the r.recv=r.read line. I have upgraded from FC3 to FC4, could this be something related to an OS or library interaction? I was going to try to confirm the last message, but now I can't reproduce the failure. |
|||
msg60749 - (view) | Author: Brian Wellington (bwelling) | Date: 2005-08-15 18:13 | |
Logged In: YES user_id=63197 The real problem we were seeing wasn't the memory leak, it was a file descriptor leak. Leaking references within the interpreter is bad, but the garbage collector will eventually notice that the system is out of memory and clean them. Leaking file descriptors is much worse, as gc won't be triggered when the process has reached it's limit, and the process will start failing with "Too many file descriptors". To easily show this problem, run the following from an interactive python interpreter: import urllib2 f = urllib2.urlopen('http://www.google.com') f.close() and from another window, run "lsof -p <pid of interpreter>". It should show a TCP socket in CLOSE_WAIT, which means the file descriptor is still open. I'm seeing weirdness on Fedora Core 4 today that I didn't see last week where after a few seconds, the file descriptor is listed as "can't identify protocol" instead of TCP, but that's not too relevant, since it's still open. Repeating the urllib2.urlopen()/close() pairs of statements in the interpreter will cause more fds to be leaked, which can also be seen by lsof. |
|||
msg60750 - (view) | Author: Steve Holden (holdenweb) * | Date: 2005-10-14 04:13 | |
Logged In: YES user_id=88157 The Windows 2.4.1 build doesn't show this error, but the Cygwin 2.4.1 build does still have uncollectable objects after a urllib2.urlopen(), so there may be a platform dependency here. No 2.4.2 on Cygwin yet, so nothing conclusive as lsof isn't available. |
|||
msg60751 - (view) | Author: Neil Swinton (nswinton) | Date: 2005-10-18 15:00 | |
Logged In: YES user_id=1363935 It's not the prettiest thing, but you can work around this by setting the socket's recv method to None before closing it. import urllib2 f = urllib2.urlopen('http://www.google.com') text=f.read() f.fp._sock.recv=None # hacky avoidance f.close() |
|||
msg76298 - (view) | Author: Toshio Kuratomi (a.badger) * | Date: 2008-11-24 05:20 | |
I tried to repeat the test in http://bugs.python.org/msg60749 and found that the descriptors will close if you read from the file before closing. so this leads to open descriptors:: import urllib2 f = urllib2.urlopen('http://www.google.com') f.close() while this does not:: import urllib2 f = urllib2.urlopen('http://www.google.com') f.read(1) f.close() |
|||
msg76300 - (view) | Author: Toshio Kuratomi (a.badger) * | Date: 2008-11-24 05:47 | |
One further data point. On two rhel5 systems with identical kernels, both x86_64, both python-2.4.3... basically, everything I've thought to check identical, I ran the test code with f.read() in an infinite loop. One system only has one TCP socket in use at a time. The other one has multiple TCP sockets in use, but they all close eventually. /usr/sbin/lsof -p INTERPRETER_PID|wc -l reported 96 67 97 63 91 62 94 78 on subsequent runs. |
|||
msg76350 - (view) | Author: Jeremy Hylton (jhylton) | Date: 2008-11-24 18:03 | |
Python 2.4 is now in security-fix-only mode. No new features are being added, and bugs are not fixed anymore unless they affect the stability and security of the interpreter, or of Python applications. http://www.python.org/download/releases/2.4.5/ This bug doesn't rise to the level of making into a 2.4.6. |
|||
msg76368 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * | Date: 2008-11-24 22:22 | |
Reopening: I reproduce the problem consistently with both 2.6 and trunk versions (not with python 3.0), on Windows XP. |
|||
msg76683 - (view) | Author: Senthil Kumaran (orsenthil) * | Date: 2008-12-01 10:40 | |
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: > > Reopening: I reproduce the problem consistently with both 2.6 and trunk > versions (not with python 3.0), on Windows XP. > I think this bug is ONLY with respect to Windows Systems. I not able to reproduce this on the current trunk on Linux Ubuntu ( 8.04). I tried 100 and 1000 instances of open and close and everytime file descriptors goes through ESTABLISHED, SYNC_SENT and closes for TCP connections. And yeah, certain instances showed 'can't identify protocol' randomly. But thats a different issue. The original bug raised for Python 2.4 was originally raised on Linux and it seems to have been fixed. A Windows expert should comment on this, if this is consistently reproducable on Windows. |
|||
msg88812 - (view) | Author: BULOT (stephbul) | Date: 2009-06-03 13:31 | |
Hello, I'm facing a urllib2 memory leak issue in one of my scripts that is not threaded. I made a few tests in order to check what was going on and I found this already existing bug thread (but old). I'm not able to figure out what is the issue yet, but here are a few informations: Platform: Debian Python version 2.5.4 I made a script (2 attached files) in order to make access to a web page (http://www.google.com) every second, that monitors number of file descriptors and memory footprint. I also introduced the gc module (Garbage Collector) in order to retrieve numbers of objects that are not freed (like already proposed in this thread but more focussed on gc.DEBUG_LEAK flag) Here are my results: First acces output: gc: collectable <dict 0xb793c604> gc: collectable <HTTPResponse instance at 0xb7938f6c> gc: collectable <dict 0xb793c4f4> gc: collectable <HTTPMessage instance at 0xb793d0ec> gc: collectable <dict 0xb793c02c> gc: collectable <list 0xb7938e8c> gc: collectable <list 0xb7938ecc> gc: collectable <instancemethod 0xb79cf824> gc: collectable <dict 0xb793c79c> gc: collectable <HTTPResponse instance at 0xb793d2cc> gc: collectable <instancemethod 0xb79cf874> unreachable objects: 11 File descriptors number: 32 Memory: 4612 Thenth access: gc: collectable <dict 0xb78f14f4> gc: collectable <HTTPResponse instance at 0xb78f404c> gc: collectable <dict 0xb78f13e4> gc: collectable <HTTPMessage instance at 0xb78f462c> gc: collectable <dict 0xb78e5f0c> gc: collectable <list 0xb78eeb4c> gc: collectable <list 0xb78ee2ac> gc: collectable <instancemethod 0xb797b7fc> gc: collectable <dict 0xb78f168c> gc: collectable <HTTPResponse instance at 0xb78f442c> gc: collectable <instancemethod 0xb78eaa7c> unreachable objects: 110 File descriptors number: 32 Memory: 4680 After hundred access: gc: collectable <dict 0x89e2e84> gc: collectable <HTTPResponse instance at 0x89e3e2c> gc: collectable <dict 0x89e2d74> gc: collectable <HTTPMessage instance at 0x89e3ccc> gc: collectable <dict 0x89db0b4> gc: collectable <list 0x89e3cac> gc: collectable <list 0x89e32ec> gc: collectable <instancemethod 0x89d8964> gc: collectable <dict 0x89e60b4> gc: collectable <HTTPResponse instance at 0x89e50ac> gc: collectable <instancemethod 0x89ddb1c> unreachable objects: 1100 File descriptors number: 32 Memory: 5284 Each call to urllib2.urlopen() gives 11 new unreachable objects, increases memory footprint without giving new open files. Do you have any idea? With the hack proposed in message http://bugs.python.org/issue1208304#msg60751, number of unreachable objects goes down to 8 unreachable objects remaining, but still memory increases. Regards. stephbul PS My urlib2leak.py test calls monitor script (not able to attach it): #! /bin/sh PROCS='urllib2leak.py' RUNPID=`ps aux | grep "$PROCS" | grep -v "grep" | awk '{printf $2}'` FDESC=`lsof -p $RUNPID | wc -l` MEM=`ps aux | grep "$PROCS" | grep -v "grep" | awk '{printf $6 }'` echo "File descriptors number: "$FDESC echo "Memory: "$MEM |
|||
msg92245 - (view) | Author: clemens pecinovsky (peci) | Date: 2009-09-04 10:17 | |
i also ran into the problem of cyclic dependencies. i know if i would call gc.collect() the problem would be solved, but calling gc.collect() takes a long time. the problem is the cyclic dependency with r.recv=r.read i have fixed it localy by wrapping the addinfourl into a new class (i called it addinfourlFixCyclRef) and overloading the close method, and within the close method set the recv to none again. class addinfourlFixCyclRef(addinfourl): def close(self): if self.fp is not None and hasattr(self.fp, "_sock"): self.fp._sock.recv = None addinfourl.close(self) .... r.recv = r.read fp = socket._fileobject(r, close=True) resp = addinfourlFixCyclRef(fp, r.msg, req.get_full_url()) and when i call .close() from the response it just works. Unluckily i had to patch even more in case there is an exception called. For the whole fix see the attachment |
|||
msg114503 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2010-08-21 15:52 | |
On Windows Vista I can consistently reproduce this with 2.6 and 2.7 but not with 3.1 or 3.2. |
|||
msg186550 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-04-11 09:27 | |
The entire description of this issue is bogus. Reference cycles are not a bug, since Python has a cyclic garbage collector. Closing as invalid. |
|||
msg186552 - (view) | Author: Ralf Schmitt (schmir) | Date: 2013-04-11 09:52 | |
I'd consider reference cycles a bug especially if they prevent filedescriptors from being closed. please read the comments. |
|||
msg186556 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-04-11 12:39 | |
I see no file descriptor leak myself: >>> f = urllib2.urlopen("http://www.google.com") >>> f.fileno() 3 >>> os.fstat(3) posix.stat_result(st_mode=49663, st_ino=5045244, st_dev=7L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=0, st_atime=0, st_mtime=0, st_ctime=0) >>> del f >>> os.fstat(3) Traceback (most recent call last): File "<stdin>", line 1, in <module> OSError: [Errno 9] Bad file descriptor Ditto with Python 3: >>> f = urllib.request.urlopen("http://www.google.com") >>> f.fileno() 3 >>> os.fstat(3) posix.stat_result(st_mode=49663, st_ino=5071469, st_dev=7, st_nlink=1, st_uid=1000, st_gid=1000, st_size=0, st_atime=0, st_mtime=0, st_ctime=0) >>> del f >>> os.fstat(3) Traceback (most recent call last): File "<stdin>", line 1, in <module> OSError: [Errno 9] Bad file descriptor Furthermore, you can use the `with` statement to ensure timely disposal of system resources: >>> f = urllib.request.urlopen("http://www.google.com") >>> with f: f.fileno() ... 3 >>> os.fstat(3) Traceback (most recent call last): File "<stdin>", line 1, in <module> OSError: [Errno 9] Bad file descriptor |
|||
msg186560 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2013-04-11 13:24 | |
Where did file descriptors come into it, surely this is all about memory leaks? In any case it's hardly a show stopper as there are at least three references above to the problem line of code and three workarounds. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:11 | admin | set | github: 42012 |
2013-04-11 13:24:20 | BreamoreBoy | set | messages: + msg186560 |
2013-04-11 12:39:18 | pitrou | set | messages: + msg186556 |
2013-04-11 09:52:24 | schmir | set | messages: + msg186552 |
2013-04-11 09:27:26 | pitrou | set | status: open -> closed nosy: + pitrou messages: + msg186550 resolution: not a bug |
2013-04-10 22:03:15 | schmir | set | nosy:
+ schmir |
2011-02-09 23:18:38 | gdub | set | nosy:
+ gdub |
2010-08-21 15:52:52 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg114503 |
2010-07-20 03:16:43 | BreamoreBoy | set | versions: + Python 2.6, Python 3.1, Python 2.7, Python 3.2, - Python 2.5 |
2009-09-04 10:17:26 | peci | set | files:
+ urllib2.py nosy: + peci messages: + msg92245 |
2009-06-03 13:31:30 | stephbul | set | files:
+ urllib2leak.py versions: - Python 2.6, Python 2.7 nosy: + stephbul messages: + msg88812 |
2008-12-01 10:40:12 | orsenthil | set | nosy:
+ orsenthil messages: + msg76683 |
2008-11-29 01:16:45 | gregory.p.smith | set | type: resource usage components: + Library (Lib), - Extension Modules versions: + Python 2.6, Python 2.5, Python 2.7, - Python 2.4 |
2008-11-24 22:22:34 | amaury.forgeotdarc | set | status: closed -> open nosy: + amaury.forgeotdarc resolution: wont fix -> (no value) messages: + msg76368 |
2008-11-24 18:03:58 | jhylton | set | status: open -> closed nosy: + jhylton resolution: wont fix messages: + msg76350 |
2008-11-24 05:47:08 | a.badger | set | messages: + msg76300 |
2008-11-24 05:20:15 | a.badger | set | nosy:
+ a.badger messages: + msg76298 |
2005-05-25 09:20:22 | manekcz | create |