This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tarfile.add() produces hard links instead of normal files
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: nnorwitz Nosy List: lars.gustaebel, mpitt, nnorwitz
Priority: normal Keywords:

Created on 2005-10-18 20:27 by mpitt, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tarfile-bug.py mpitt, 2005-10-18 20:27 test case
Messages (7)
msg26616 - (view) Author: Martin Pitt (mpitt) Date: 2005-10-18 20:27
When opening a tarfile for writing and adding several
files, some files end up being a hardlink to a
previously added tar member instead of being a proper
file member.

I attach a demo that demonstrates the problem. It
basically does:

tarfile.open('tarfile-bug.tar', 'w')
tar.add('tarfile-bug-f1')
tar.add('tarfile-bug-f2')
tar.close()

in the resulting tar, "tarfile-bug-f2" is a hard link
to tarfile-bug-f1, although both entries should be
proper files.

It works when the tarfile is close()d and opened again
in append mode between the two add()s, but that slows
down the process dramatically and is certainly not the
intended way.
msg26617 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2005-10-19 09:31
Logged In: YES 
user_id=642936

This is a feature ;-)
tarfile.py records the inode and device number (st_ino,
st_dev) for each added file in a list (TarFile.inodes). When
a new file is added and its inode and device number is found
in this list, it will be added as a hardlink member,
otherwise as a regular file.
Because your test script adds and immediately removes each
file, both files are assigned the same inode number. If you
had another process creating a file in the meantime, the
problem would not occur, because it would take over the
inode number before the second file has the chance.

Your problem shows that the way tarfile.py handles hardlinks
is too sloppy. It must take the stat.st_nlink field into
account. I will create a fix for this.

As a workaround you have several options:
- Do not remove the files after adding them, but after the
TarFile is closed.
- Set TarFile.dereference to False before adding files, so
files with several links would always be added as regular
files (see the Documentation). Disadvantage: symbolic links
would be added as regular files as well.
- Tamper with the source code. Edit TarFile.gettarinfo().
Change the line that says "if inode in self.inodes and not
self.dereference:" to "if statres.st_nlink > 1 and inode in
self.inodes and not self.dereference:".
- Empy the TarFile.inodes list after each file. Ugh!

msg26618 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2005-10-19 12:41
Logged In: YES 
user_id=642936

I just submitted patch #1331635 which ought to fix your
problem. Thank you for your report.
msg26619 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2005-10-20 04:59
Logged In: YES 
user_id=33168

Martin, I have checked in Lars' patch.  If this does not fix
your problem, please re-open this bug report.

Checked in as:
 * Lib/tarfile.py 1.34 and 1.21.2.6
 * Lib/test/test_tarfile.py 1.20 and 1.16.2.2
msg26620 - (view) Author: Martin Pitt (mpitt) Date: 2005-10-20 14:38
Logged In: YES 
user_id=80975

Thanks for the quick reply!

Unfortunately, not removing the files after adding them to
the tarfile is not really an option. I want to create a
really huge tar file and put compressed files into it. For
that purpose I create a temporary gzip file, put that into
the tarfile, and remove the temporary file again. First,
keeping track of all temp files would be cumbersome, and
second it could quickly lead to disk space exhaustion.

I'll try your patch now.
msg26621 - (view) Author: Martin Pitt (mpitt) Date: 2005-10-20 14:45
Logged In: YES 
user_id=80975

Confirmed, works perfectly now. Thank you very much! Will
this also be fixed in a stable point release? Or just in 2.5?
msg26622 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2005-10-20 16:29
Logged In: YES 
user_id=33168

It will be fixed in 2.4.3 when released (that's the branch
tags below, ie the second RCS rev number after each file).
History
Date User Action Args
2022-04-11 14:56:13adminsetgithub: 42497
2005-10-18 20:27:54mpittcreate