This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: i get a memory leak after using split() function on windows
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: hyeshik.chang, leojay, tim.peters
Priority: normal Keywords:

Created on 2006-01-07 10:04 by leojay, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (3)
msg27253 - (view) Author: Leo Jay (leojay) Date: 2006-01-07 10:04
my environment is: python 2.4.2 on windows xp 
professional with sp2

what i do is just open a text file and count how many 
lines in that file:

D:\>python
Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 
32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for 
more information.
>>> f = file('b.txt', 'rt')   # the size of b.txt is 
about 100 mega bytes
>>> print len(f.read().split('\n'))
899830
>>> f.close()
>>> del f
>>>

after these steps, the task manager shows that the 
python process still hog about 125 mega bytes memory. 
and the python don't release these memory until i 
quit the python.

but i find that if i remove the split() function,  
python acts right:

D:\>python
Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 
32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for 
more information.
>>> f = file('b.txt', 'rt')
>>> print len(f.read())
95867667
>>>


so, is there something wrong with the split function? 
or it's just my misuse of split?


Best Regards, 
Leo Jay
msg27254 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2006-01-07 15:54
Logged In: YES 
user_id=55188

That's came from pymalloc's behavior.
Pymalloc never frees allocated memory in heap. For more
informations about this, see a mailing list thread
http://mail.python.org/pipermail/python-dev/2004-October/049480.html

The usual way to resolve the problem is to utilize iterator
style loops instead of reading the whole data at a time.
msg27255 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2006-01-07 23:54
Logged In: YES 
user_id=31435

Specifically, you end up with about a million string objects
simultaneously alive.

That consumes about 24 million bytes for string object
headers, + another 100 million bytes for the string contents.

Python can reuse all that memory later, but pymalloc does
not (as perky said) "give it back".

Do note that this is an extraordinarily slow and wasteful
way to count lines.  If that's what you want and you don't
care about peak memory use, then f.read().count('\n') is one
simpler and faster way that creates only one string object.    

The string object is so large in that case (about 100
million bytes) that pymalloc delegates memory management to
the platform malloc.  Whether or not the platform malloc
"gives back" its memory to the OS is a wholly different
question, and one over which Python has no control.  On
Windows NT+ it's very likely to, though.
History
Date User Action Args
2022-04-11 14:56:14adminsetgithub: 42772
2006-01-07 10:04:35leojaycreate