This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zipfile: inconsistent filenames with InfoZip "unzip"
Type: Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gward Nosy List: ahlstromjc, gvanrossum, gward, sjones
Priority: normal Keywords:

Created on 2003-06-15 21:23 by gward, last changed 2022-04-10 16:09 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
Demo.zip gward, 2003-06-15 21:23 zip file exhibiting inconsistent filenames
Messages (6)
msg16417 - (view) Author: Greg Ward (gward) (Python committer) Date: 2003-06-15 21:23
zipfile.py gives filenames inconsistent with the
InfoZIP "unzip" utility for certain ZIP files.  My
source is an email virus, so the ZIP files are almost
certainl malformed.  Nevertheless, it would be nice if
"unzip -l" and ZipFile.namelist() gave consistent
filenames.

Example: the attached Demo.zip (extracted from an email
virus caught on mail.python.org) looks like this
according to InfoZip:

$ unzip -l /tmp/Demo.zip 
Archive:  /tmp/Demo.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
    44544  01-26-03 20:49  
DOCUME~1\CHRISS~1\LOCALS~1\Temp\Demo.exe
 --------                   -------
    44544                   1 file

But according to ZipFile.namelist(), the name of that
file is:
 
DOCUME~1\CHRISS~1\LOCALS~1\Temp\Demo.exescr000000000000000000.txt

Getting the same result with Python 2.2.2 and a
~2-week-old build of 2.3 CVS.
msg16418 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2003-06-16 01:19
Logged In: YES 
user_id=6380

That almost sounds like an intentional inconsistency. Could
it be that the central directory has one name but the local
header has a different one? Or that there's a null byte in
the filename so that the filename length is inconsistent?
The front of the file looks like this according to od -c:

0000000   P   K 003 004  \n  \0  \0  \0  \0  \0   *   Š   :
  .   c   Ì
0000020  \v   g  \0   ®  \0  \0  \0   ®  \0  \0   D  \0  \0
 \0   D   O
0000040   C   U   M   E   ~   1   \   C   H   R   I   S   S
  ~   1   \
0000060   L   O   C   A   L   S   ~   1   \   T   e   m   p
  \   D   e
0000100   m   o   .   e   x   e  \0  \0   s   c   r  \0   0
  0   0   0
0000120   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   .   t
0000140   x   t   M   Z 220  \0 003  \0  \0  \0 004  \0  \0
 \0   ÿ   ÿ
0000160  \0  \0   ž  \0  \0  \0  \0  \0  \0  \0   @  \0  \0
 \0  \0  \0
msg16419 - (view) Author: Shannon Jones (sjones) Date: 2003-06-16 01:23
Logged In: YES 
user_id=589306

The actual filename from the zipfile is:
filename =
'DOCUME~1\\CHRISS~1\\LOCALS~1\\Temp\\Demo.exe\x00\x00scr\x00000000000000000000.txt'

Notice there is a \x00 after Demo.exe. My guess is InfoZip
stores the filename in a null terminated string and this
extra null character in the filename terminates it at this
point. Python doesn't care if you have nulls in the string,
so it prints the entire filename.

You can see the zip file format description at
ftp://ftp.info-zip.org/pub/infozip/doc/appnote-981119-iz.zip

The format does say:
      2)  String fields are not null terminated, since the
          length is given explicitly.

But it doesn't really say if strings are allowed to have
nulls in them.

So does Python or InfoZip get this right?
msg16420 - (view) Author: James C. Ahlstrom (ahlstromjc) Date: 2003-06-16 14:29
Logged In: YES 
user_id=64929

The analysis by sjones is correct.  Python and the zip file 
format both allow null bytes in file names.  But in this case, 
the file is infected with the "I-Worm.Lentin.o" virus and the 
file name is designed to hide this.  The file name ends in ".txt" 
but the file name up to the null byte ends in ".exe".  The 
intention is that a virus scanner would skip this file because it 
ends in ".txt" ( a non-executable text file), but that 
the ".exe" would be seen (an executable program file) if the 
file were clicked, and so the file would be executed.

Testing this on my machine, my virus scanner (Kaspersky) 
nevertheless flags the ".zip" file as containing a virus, but this 
depends on the particular virus scanner and its settings.

I suggest that zipfile.py should terminate file names at a null 
byte as InfoZip does.
msg16421 - (view) Author: James C. Ahlstrom (ahlstromjc) Date: 2003-06-17 15:50
Logged In: YES 
user_id=64929

I submitted a patch for this.  It is 755987.  See further 
comments there.
msg16422 - (view) Author: Greg Ward (gward) (Python committer) Date: 2003-06-18 01:08
Logged In: YES 
user_id=14422

Fixed with patch #755987.
History
Date User Action Args
2022-04-10 16:09:14adminsetgithub: 38653
2003-06-15 21:23:46gwardcreate