Issue 1399099: i get a memory leak after using split() function on windows

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/42772

classification

Title:	i get a memory leak after using split() function on windows
Type:		Stage:
Components:	Library (Lib)	Versions:	Python 2.4

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:		Nosy List:	hyeshik.chang, leojay, tim.peters
Priority:	normal	Keywords:

Created on 2006-01-07 10:04 by leojay, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (3)
msg27253 - (view)	Author: Leo Jay (leojay)	Date: 2006-01-07 10:04
my environment is: python 2.4.2 on windows xp professional with sp2 what i do is just open a text file and count how many lines in that file: D:\>python Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> f = file('b.txt', 'rt') # the size of b.txt is about 100 mega bytes >>> print len(f.read().split('\n')) 899830 >>> f.close() >>> del f >>> after these steps, the task manager shows that the python process still hog about 125 mega bytes memory. and the python don't release these memory until i quit the python. but i find that if i remove the split() function, python acts right: D:\>python Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> f = file('b.txt', 'rt') >>> print len(f.read()) 95867667 >>> so, is there something wrong with the split function? or it's just my misuse of split? Best Regards, Leo Jay
msg27254 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2006-01-07 15:54
Logged In: YES user_id=55188 That's came from pymalloc's behavior. Pymalloc never frees allocated memory in heap. For more informations about this, see a mailing list thread http://mail.python.org/pipermail/python-dev/2004-October/049480.html The usual way to resolve the problem is to utilize iterator style loops instead of reading the whole data at a time.
msg27255 - (view)	Author: Tim Peters (tim.peters) *	Date: 2006-01-07 23:54
Logged In: YES user_id=31435 Specifically, you end up with about a million string objects simultaneously alive. That consumes about 24 million bytes for string object headers, + another 100 million bytes for the string contents. Python can reuse all that memory later, but pymalloc does not (as perky said) "give it back". Do note that this is an extraordinarily slow and wasteful way to count lines. If that's what you want and you don't care about peak memory use, then f.read().count('\n') is one simpler and faster way that creates only one string object. The string object is so large in that case (about 100 million bytes) that pymalloc delegates memory management to the platform malloc. Whether or not the platform malloc "gives back" its memory to the OS is a wholly different question, and one over which Python has no control. On Windows NT+ it's very likely to, though.

History
Date	User	Action	Args
2022-04-11 14:56:14	admin	set	github: 42772
2006-01-07 10:04:35	leojay	create