This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tarfile can't extract some tar archives..
Type: Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: lars.gustaebel, meren, nnorwitz
Priority: normal Keywords:

Created on 2005-10-24 17:47 by meren, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
buggytarexamlple.py meren, 2005-10-25 18:59 example workaround
Messages (7)
msg26689 - (view) Author: A. Murat EREN (meren) Date: 2005-10-24 17:47
Here is a small demo to reproduce the same problem:


-----------------8<-----------------8<-----------------8<-----------------8<---
meren@pardus /home/meren $ wget
ftp://ftp.sleepycat.com/releases/db.1.85.tar.gz
(...)
100%[============>] 270,953       17.13K/s    ETA 00:00

20:21:09 (15.25 KB/s) - `db.1.85.tar.gz' saved [270,953]

meren@pardus /home/meren $ file db.1.85.tar.gz
db.1.85.tar.gz: gzip compressed data, from Unix
meren@pardus /home/meren $ python
>>> tar = tarfile.open("db.1.85.tar.gz", "r:gz")
>>> for tarinfo in tar:
...     print tarinfo.name
...
db.1.85
db.1.85/btree
db.1.85/btree/Makefile.inc
db.1.85/btree/bt_close.c
db.1.85/btree/bt_conv.c
db.1.85/btree/bt_debug.c
db.1.85/btree/bt_delete.c
db.1.85/btree/btree.h
db.1.85/btree/bt_get.c
db.1.85/btree/bt_open.c
(...) 
>>> for tarinfo in tar:
...     tar.extract(tarinfo)
...
>>>  Ctrl + D
meren@pardus /home/meren $ ls db*
db.1.85
db.1.85.tar.gz
meren@pardus /home/meren $ file db.1.85
db.1.85: empty
meren@pardus /home/meren $ cat db.1.85
meren@pardus /home/meren $ 
----------------->8----------------->8----------------->8----------------->8---

Also this file is extracting with the same result too:
ftp://ftp.linux.org.tr/pub/mirrors/gentoo/distfiles/ncompress-4.2.4.tar.gz

This thing is very rarely happening, but it is
happening. Also, I could extract these archives
properly via the native 'tar' binary..


Thanks in advance,
Ciao.
msg26690 - (view) Author: A. Murat EREN (meren) Date: 2005-10-24 18:19
Logged In: YES 
user_id=718263

more examples: 
 
ftp://ftp.porcupine.org/pub/security/portmap_5beta.tar.gz 
ftp://ftp.porcupine.org/pub/security/tcp_wrappers_7.6.tar.gz 
 
additionally, the same problem appearing when trying to extract 
these archives with the "ark" (yet another kde tool, just a simple 
front-end for the tar command. interesting, isn't it). 
 
 
Ciao. 
msg26691 - (view) Author: A. Murat EREN (meren) Date: 2005-10-25 18:36
Logged In: YES 
user_id=718263

I figured out that this is a very pesky problem.

The problem is coming from the tar archives themselves.
Simply, the "tarinfo.isdir()" check in the library returns
false for the directories and they are extracting like a
regular file.. How did they create these fool archives I
don't know, and I couldn't reproduce similar buggy archives
by myself. It would be nice to know.

Because of the problem is not in the python library, it is
very difficult to implement an efficent workaround to
properly extract this kind of buggy tar archives. I'm going
to attach a dirty workaround to show the idea (is anyone
reading these reports?)..


Ciao..
msg26692 - (view) Author: A. Murat EREN (meren) Date: 2005-10-25 18:58
Logged In: YES 
user_id=718263

no files attached.. sorry, here it is..
msg26693 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2005-10-26 10:31
Logged In: YES 
user_id=642936

I submitted patch #1338314 which fixes this problem.
tarfile.py is aware of these "buggy" archives, but the
workaround did not work anymore.
Thank you for your report, especially for the vast number of
test archives.
msg26694 - (view) Author: A. Murat EREN (meren) Date: 2005-10-26 12:35
Logged In: YES 
user_id=718263

thanks for the patch, it is just working.
msg26695 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2005-10-28 06:01
Logged In: YES 
user_id=33168

Committed revision 41340.
Committed revision 41341.
History
Date User Action Args
2022-04-11 14:56:13adminsetgithub: 42515
2005-10-24 17:47:20merencreate