This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: os.rmtree error handling was always broken
Type: Stage:
Components: None Versions: Python 2.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jlgijsbers Nosy List: gvanrossum, jlgijsbers
Priority: release blocker Keywords:

Created on 2004-10-18 01:24 by gvanrossum, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
shutil-rmtree-onerror.diff jlgijsbers, 2004-10-18 22:18
shutil-rmtree-onerror-2.diff jlgijsbers, 2004-10-19 08:57
shutil-rmtree-no-listdir.diff jlgijsbers, 2004-10-23 03:01
rmtree.txt gvanrossum, 2004-10-26 16:08 Guido's no-nonsense version
rmtree-refactored.py jlgijsbers, 2004-10-27 09:57
rmtree-ng.py gvanrossum, 2004-10-27 15:56 Guido's improved version
Messages (17)
msg22755 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-10-18 01:24
I just realized that the error handling in
shutil.rmtree has been broken ever since Python 2.3,
and the new implementation in 2.4 inherits this bug.

The problem is that the try/except is outside the for
loop, so that when an error is found, if either
ignore_errors or onerror is set, the desired action is
taken *once* but then the function returns rather than
trying to remove more files (ignoring or reporting
subsequent errors) as it should.

The sad thing is that the original code by David Ascher
(shutil.py rev 1.8) was correct! The bug was introduced
in rev 1.27, which claimed to make the implementation
agree with the docs. :-(

I'm giving this a high priority, hoping that it will be
fixed in 2.4b2 (and hopefully backported to 2.3).
msg22756 - (view) Author: Johannes Gijsbers (jlgijsbers) * (Python triager) Date: 2004-10-18 06:06
Logged In: YES 
user_id=469548

Barry, hope you don't mind I'll take this one. I can have a
fix tonight or tomorrow.
msg22757 - (view) Author: Johannes Gijsbers (jlgijsbers) * (Python triager) Date: 2004-10-18 22:18
Logged In: YES 
user_id=469548

Patch attached. A review would be appreciated, especially as
to whether there's a nicer way to handle the problem, but
I'll check it in before 2.4b2 anyway.
msg22758 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-10-18 23:32
Logged In: YES 
user_id=6380

(I'm sure Barry didn't mind, I just assigned it to him
because rev. 1.27 had his name on it. :-)

I saw the patch, don't have time to apply it, note that it
has three(!) nested try-except blocks; instead, it should
really have some smaller ones around just exactly the call
that you expect to raise an exception.  Exceptions elsewhere
should *not* be caught.

Also, the raise statement in handle_error() should just be
"raise" without parameters, which re-raises the original
exception. The exception already contains the filename, usually.

Finally: os.walk() seems to suppress errors in os.listdir().
Technically, errors in os.listdir() should be treated the
same as errors in os.remove() / os.rmdir(), but that means
we can't use os.walk() -- or we'd have to add error handling
to it recursively.
msg22759 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-10-18 23:42
Logged In: YES 
user_id=6380

(Actually, os.walk has an onerror argument, it just uses a
different convention and default, so we could thread this
through easily.)
msg22760 - (view) Author: Johannes Gijsbers (jlgijsbers) * (Python triager) Date: 2004-10-19 08:57
Logged In: YES 
user_id=469548

A new patch is attached. I factored out the
handle_call_error method so exception handling is more
local. Also, errors in os.walk were already being handled
using the _raise_err/except OSError combination,  but we now
do it using the onerror argument to os.walk.
msg22761 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-10-19 15:18
Logged In: YES 
user_id=6380

This looks fine, and can be checked in.

There's one minor remaining issue, but I don't see how to
fix it without changing the signature for the onerror
function that os.walk() takes, and that's a no-no: when the
problem is a directory permission preventing os.listdir() to
work, the path passed to the rmtree caller's onerror
function is always the toplevel path, rather than the
directory that couldn't be listed. The real path is included
in the error message. (Hmm, perhaps you could get it out of
there? I think it's an attribute of the exception object;
but be careful, it may not exist.)

It's unfortunate that the error argument to the onerror
function is a sys.exc_info() triple rather than an exception
object (calling sys.exc_info() is expensive), but that's an
API choice from the past that we can't change until Python 3.0.
msg22762 - (view) Author: Johannes Gijsbers (jlgijsbers) * (Python triager) Date: 2004-10-20 20:30
Logged In: YES 
user_id=469548

With 'be careful, it may not exist.', do you mean rmtree
can't rely on exc.filename being set?
msg22763 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-10-21 04:42
Logged In: YES 
user_id=6380

Yes.
msg22764 - (view) Author: Johannes Gijsbers (jlgijsbers) * (Python triager) Date: 2004-10-23 03:01
Logged In: YES 
user_id=469548

You know what: I think we should remove the os.listdir error
handling:

* The docs only specify "errors resulting from *failed
removals*" as being handled. If they didn't we should worry
about OSErrors from the os.path.* calls in os.walk() too.
* The os.listdir error handling hasn't been in a final
release, so no backwards compatibility problems yet.
* We can't get the path right for os.listdir.
* The problem also existed before the rewrite using
os.walk(), so reverting to that is not a solution.

Patch (the last?) is attached.
msg22765 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-10-23 17:56
Logged In: YES 
user_id=6380

I disagree. The docs are wrong because a failing listdir
wasn't anticipated at first when this function was written.
A failing listdir is a common failure case; I don't see how
the os.path calls in os.walk could ever fail (since they all
catch os.error).
msg22766 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-10-23 18:25
Logged In: YES 
user_id=6380

BTW, if you want the correct path in the onerror calls for
os.listdir, your best bet is  not to use os.walk but to
write something similar that suits our purpose. After you
remove the features we don't use, it's really only a few
lines of code.
msg22767 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-10-26 16:08
Logged In: YES 
user_id=6380

How about this no-nonsense version? (rmtree.txt upload) The
only thing you could quibble about is what to do when
os.lstat fails. I chose backward compatibility.
msg22768 - (view) Author: Johannes Gijsbers (jlgijsbers) * (Python triager) Date: 2004-10-27 09:57
Logged In: YES 
user_id=469548

Yes, I think this is the right way to go about it. I'd
personally make a couple of changes, but those are up to
your discretion:

* Factor out the try: except:'s to a nested function.
* Use sys.exc_info() instead of the exception for backwards
compatibility.
* Use 'os.path.isdir(fullname) and not
os.path.islink(fullname)' expression instead of the lstat
call. I think it more clearly communicates what we want,
even though it's more stat calls.

Latest version attached (rmtree-refactored.py).
msg22769 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-10-27 15:56
Logged In: YES 
user_id=6380

- I don't like the nested function -- Python's function call
overhead is significant and I worry about that when deleting
massive trees. (I want to be able to "rm -rf /*"
efficiently. :-)

- The err instance instead of sys.exc_info() was an
oversight. The docstring repeats this mistake.

- I don't think the lstat() call is unclean, and again I
prefer performance.

- Here's an additional trick to reduce the code duplication
(see upload rmtree-ng.py):

At the top of the function put:

if ignore_errors:
....def onerror(*args): pass
elif onerror is None:
....def onerror(*args): raise

Then all the except blocks can be collapsed to just the
onerror() call. (I care less about the performance in case
there's an error.)
msg22770 - (view) Author: Johannes Gijsbers (jlgijsbers) * (Python triager) Date: 2004-10-31 12:07
Logged In: YES 
user_id=469548

Ok, checked in rmtree-ng.py version as rev. 1.34 of
shutil.py and rev 1.8 of test_shutil.py.
msg22771 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-10-31 21:57
Logged In: YES 
user_id=6380

Thanks a lot! I think this qualifies for a backport to 2.3
as well.
History
Date User Action Args
2022-04-11 14:56:07adminsetgithub: 41042
2004-10-18 01:24:12gvanrossumcreate