This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: shutils.rmtree() uses excessive amounts of memory
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jlgijsbers Nosy List: jamesh, jlgijsbers, tim.peters
Priority: normal Keywords:

Created on 2004-09-09 13:52 by jamesh, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
shutils-rmtree.py jamesh, 2004-09-10 01:33 updated shutils.rmtree()
shutil-rmtree-os-walk.diff jlgijsbers, 2004-09-11 21:36 shutils-rmtree.py as a patch
Messages (6)
msg22397 - (view) Author: James Henstridge (jamesh) Date: 2004-09-09 13:52
The shutils.rmtree() implementation uses an excessive
amount of memory when deleting large directory heirarchies.

Before actually deleting any files, it builds up a list
of (function, filename) tuples for all the files that
it is going to remove.  If there are a lot of files,
this results in a lot of memory for a large heirarchy
(I had a Python process using 800MB in one case).

I'm not sure why it is doing things this way.  It isn't
using the list to avoid recursion, so the depth of
directories it can remove is still limited by Python's
recursion limit.

Replacing _build_cmdtuple() with a generator might be a
good way to reduce the memory usage while leaving the
rest of the code unchanged.

I checked in CVS, and this issue is still present on HEAD.
msg22398 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2004-09-09 14:17
Logged In: YES 
user_id=31435

Rewrite it using os.walk() (not os.path.walk()) with 
topdown=False.
msg22399 - (view) Author: James Henstridge (jamesh) Date: 2004-09-10 01:33
Logged In: YES 
user_id=146903

Attached is a Python file including a fixed up
shutils.rmtree() using os.walk().  It seems to work for me,
and should have the same error behaviour.
msg22400 - (view) Author: Johannes Gijsbers (jlgijsbers) * (Python triager) Date: 2004-09-11 21:36
Logged In: YES 
user_id=469548

Please attach changes as a patch next time. I've attached
shutils-rmtree.py as a patch this time.

I gave it a quick review and added a test to test_shutil.py
to ensure some not-very-obvious behavior (don't delete a
path passed to rmtree if it's a file) would be preserved.
The new version seems fine to me. Tim, could you take a look
at it as well?
msg22401 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2004-09-11 22:56
Logged In: YES 
user_id=31435

I don't really have time for a thorough review.  I'll note that 
stuff like

func = something1
arg = something2
func(arg)

looks, to my eye, like a convoluted way to say

something1(something2)

I suppose that's to keep the onerror= gimmick working, 
though.
msg22402 - (view) Author: Johannes Gijsbers (jlgijsbers) * (Python triager) Date: 2004-10-07 21:13
Logged In: YES 
user_id=469548

I just looked at my own patch again, added the _raise_err
function back in and checked it in as rev 1.33 of shutil.py. 
History
Date User Action Args
2022-04-11 14:56:06adminsetgithub: 40887
2004-09-09 13:52:40jameshcreate