This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: TarFile.addfile() throws a struct.error
Type: Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: lars.gustaebel Nosy List: dvusboy, lars.gustaebel
Priority: normal Keywords:

Created on 2007-04-20 08:21 by dvusboy, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (5)
msg31841 - (view) Author: K. C. Wong (dvusboy) Date: 2007-04-20 08:21
When adding a file to a TarFile instance using addfile(), if the file paths (name and arcname) are unicode strings, then a struct.error will the raised. Python versions prior to 2.5 do not show this behaviour.

Assuming the current directory has a file name 'mac.txt', here is an interactive session that shows the problem:

Python 2.5 (r25:51908, Apr 18 2007, 19:06:57)
[GCC 3.4.6 20060404 (Red Hat 3.4.6-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tarfile
>>> t=tarfile.open('test.tar', 'w')
>>> i=t.gettarinfo(u'mac.txt', u'mac.txt')
>>> t.addfile(i, file(u'mac.txt', 'r'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/tarfile.py", line 1422, in addfile
    self.fileobj.write(tarinfo.tobuf(self.posix))
  File "/usr/lib/python2.5/tarfile.py", line 871, in tobuf
    buf = struct.pack("%ds" % BLOCKSIZE, "".join(parts))
  File "/usr/lib/python2.5/struct.py", line 63, in pack
    return o.pack(*args)
struct.error: argument for 's' must be a string
msg31842 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2007-04-20 19:25
tarfile.py was never guaranteed to work correctly with unicode filenames. The fact that this works with Python < 2.5 is purely accidental.

You can work around this (sticking to your example):

i = t.gettarinfo(u'mac.txt', 'mac.txt')

or:

i = t.gettarinfo('mac.txt')
msg31843 - (view) Author: K. C. Wong (dvusboy) Date: 2007-04-21 04:54
I see the work around, and I have already implemented similar workarounds in my code. However, I have 2 problem with this response:

1. The behaviour changed. As the documentation did not explicitly say tarfile does not support unicode file paths, and it worked prior to 2.5, then I would say breaking that behaviour at the least calls for a documentation update.
2. The error message stamps from struct failing to pack a unicode string. First of all, I did not grasp the subtle message of it being a unicode string as opposed to a non-unicode string. You see, I actually did not expect unicode string in the first place, it was a by-product of TEXT_DATA from a DOM tree. I can understand why struct.pack() throws (because no explicit encoding scheme was specified) but it was so cryptic with regard to tarfile itself, that I had to modify tarfile to track down the reason for the exception.
In short, I would prefer the owner of tarfile to make an explicit support or not-supported call on unicode file path, document said decision and make more reasonable attempt in presenting releavant exceptions.
Thank you for looking into this.
msg31844 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2007-04-21 12:25
I checked in a simple fix to the release25-maint branch (rev. 54908). I will come up with a more solid approach for 2.6.
msg31845 - (view) Author: K. C. Wong (dvusboy) Date: 2007-04-22 04:13
Much thanks for your effort.
History
Date User Action Args
2022-04-11 14:56:23adminsetgithub: 44872
2007-04-20 08:21:36dvusboycreate