This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tarfile: add extractall() method
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: loewis Nosy List: lars.gustaebel, loewis
Priority: high Keywords: patch

Created on 2004-10-10 08:41 by lars.gustaebel, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
extractall.patch lars.gustaebel, 2004-10-10 08:41 implementation of an extractall() method and documentation
Messages (2)
msg47043 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2004-10-10 08:41
I had a discussion with tarfile.py user Joeseph Jones a
few weeks ago who had problems with a certain tar
archive. It contained the following:

$ tar tvzf fish.tar.gz
dr-xr-xr-x devnull/devnull   0 2004-09-14 23:36:01 fish/
-rw-r--r-- devnull/devnull 221190144 2004-09-14
23:35:46 fish/m8.avi

Note that the fish/ directory has no write permissions.
When he tried to unpack this archive with tarfile.py,
the fish/m8.avi file was not extracted, because the
fish/ directory is extracted first and has its
attributes set to read-only. This is a major issue
especially for restoring system backups.

We considered several alternatives:
1. One thing Joeseph criticized was that the
documentation does not at all mention that tarfile.py's
random access nature brings problems with it. So, we
could add an example to the example section of the docs
which illustrates how to work around these.

2. In order not to change the API and be transparent we
could remodel the extract() method to go down the
directory tree every time it is called, change
directory permissions if necessary, extract the file
and restore the permissions again.

3. The TarFile object could store all extracted
directories in a list and set their permissions in the
end when it is closed.

4. We could introduce a new method (e.g. extractall())
that takes care of these issues. It will extract all
members and keep a list of directories which it will
extract with default attributes. In a second run it
will set the directory attributes accordingly.

I favour alternative 4 and have written a patch that
implements it. It has several advantages over the other
three:

- It is a shortcut. If a user wants to extract an
entire archive, he won't need to iterate over it, he
could do it with a one-liner.
- It is a concise and isolated implementation.
- It would eliminate another issue that I find annoying
since the beginning of tarfile but that was never worth
introducing a new method: When a file is created in a
directory, the directory's modification time is reset.
So, after the extraction process, the mtimes are only
correct on empty directories.

So, tarfile.py would lose its last disadvantages in
comparison to tar(1).



msg47044 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-03-04 19:46
Logged In: YES 
user_id=21627

Thanks for the patch, applied as

libtarfile.tex 1.8
tarfile.py 1.26
NEWS 1.1265

Sorry it took so long.
History
Date User Action Args
2022-04-11 14:56:07adminsetgithub: 40998
2004-10-10 08:41:21lars.gustaebelcreate