Issue1330039
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2005-10-18 20:27 by mpitt, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
tarfile-bug.py | mpitt, 2005-10-18 20:27 | test case |
Messages (7) | |||
---|---|---|---|
msg26616 - (view) | Author: Martin Pitt (mpitt) | Date: 2005-10-18 20:27 | |
When opening a tarfile for writing and adding several files, some files end up being a hardlink to a previously added tar member instead of being a proper file member. I attach a demo that demonstrates the problem. It basically does: tarfile.open('tarfile-bug.tar', 'w') tar.add('tarfile-bug-f1') tar.add('tarfile-bug-f2') tar.close() in the resulting tar, "tarfile-bug-f2" is a hard link to tarfile-bug-f1, although both entries should be proper files. It works when the tarfile is close()d and opened again in append mode between the two add()s, but that slows down the process dramatically and is certainly not the intended way. |
|||
msg26617 - (view) | Author: Lars Gustäbel (lars.gustaebel) * | Date: 2005-10-19 09:31 | |
Logged In: YES user_id=642936 This is a feature ;-) tarfile.py records the inode and device number (st_ino, st_dev) for each added file in a list (TarFile.inodes). When a new file is added and its inode and device number is found in this list, it will be added as a hardlink member, otherwise as a regular file. Because your test script adds and immediately removes each file, both files are assigned the same inode number. If you had another process creating a file in the meantime, the problem would not occur, because it would take over the inode number before the second file has the chance. Your problem shows that the way tarfile.py handles hardlinks is too sloppy. It must take the stat.st_nlink field into account. I will create a fix for this. As a workaround you have several options: - Do not remove the files after adding them, but after the TarFile is closed. - Set TarFile.dereference to False before adding files, so files with several links would always be added as regular files (see the Documentation). Disadvantage: symbolic links would be added as regular files as well. - Tamper with the source code. Edit TarFile.gettarinfo(). Change the line that says "if inode in self.inodes and not self.dereference:" to "if statres.st_nlink > 1 and inode in self.inodes and not self.dereference:". - Empy the TarFile.inodes list after each file. Ugh! |
|||
msg26618 - (view) | Author: Lars Gustäbel (lars.gustaebel) * | Date: 2005-10-19 12:41 | |
Logged In: YES user_id=642936 I just submitted patch #1331635 which ought to fix your problem. Thank you for your report. |
|||
msg26619 - (view) | Author: Neal Norwitz (nnorwitz) * | Date: 2005-10-20 04:59 | |
Logged In: YES user_id=33168 Martin, I have checked in Lars' patch. If this does not fix your problem, please re-open this bug report. Checked in as: * Lib/tarfile.py 1.34 and 1.21.2.6 * Lib/test/test_tarfile.py 1.20 and 1.16.2.2 |
|||
msg26620 - (view) | Author: Martin Pitt (mpitt) | Date: 2005-10-20 14:38 | |
Logged In: YES user_id=80975 Thanks for the quick reply! Unfortunately, not removing the files after adding them to the tarfile is not really an option. I want to create a really huge tar file and put compressed files into it. For that purpose I create a temporary gzip file, put that into the tarfile, and remove the temporary file again. First, keeping track of all temp files would be cumbersome, and second it could quickly lead to disk space exhaustion. I'll try your patch now. |
|||
msg26621 - (view) | Author: Martin Pitt (mpitt) | Date: 2005-10-20 14:45 | |
Logged In: YES user_id=80975 Confirmed, works perfectly now. Thank you very much! Will this also be fixed in a stable point release? Or just in 2.5? |
|||
msg26622 - (view) | Author: Neal Norwitz (nnorwitz) * | Date: 2005-10-20 16:29 | |
Logged In: YES user_id=33168 It will be fixed in 2.4.3 when released (that's the branch tags below, ie the second RCS rev number after each file). |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:13 | admin | set | github: 42497 |
2005-10-18 20:27:54 | mpitt | create |