This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: in email can't get attachments' filenames using get_filename
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: barry Nosy List: barry, ichneumon, mpasniew, r.david.murray
Priority: normal Keywords:

Created on 2006-01-11 21:47 by mpasniew, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (7)
msg27281 - (view) Author: Michal P. (mpasniew) Date: 2006-01-11 21:47
in the email package (2.4.1) the get_filename() method
returns the MIME field "filename" but some messages
have 'name' field instead, for example:

USUALLY THE HEADER IS:
Content-Type: application/octet-stream;
        name="XX.pdf"
Content-Transfer-Encoding: base64
Content-Description: XX.pdf
Content-Disposition: attachment;
        filename="XX.pdf"

BUT SOMETIMES THE HEADER IS:
Content-type: application/octet-stream; name="XX.xls"
Content-transfer-encoding: base64

For this to work properly I had to code a hack along
these lines:
filename = part.get_filename()
if not filename:
   ct = part.get("Content-type")
   m = re.compile('name=\"(\S+)\"').search(ct, 1)
   if m: filename=m.group(1)

But it would be helpful to code this in the get_filename()

Michal
msg27282 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2006-01-17 04:35
Logged In: YES 
user_id=12800

r42075 for Python 2.3 / email 2.5 (will close after porting
to all other branches).
msg27283 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2006-01-17 05:09
Logged In: YES 
user_id=12800

r42076 for email 3.0 / python 2.5

r42077 for python 2.4


msg114974 - (view) Author: Ich Neumon (ichneumon) Date: 2010-08-26 13:37
Looks like this one needs reopening to me... I've recently had to parse out attachments with the following Content-Type lines and no Content-Disposition provided:

Content-Type: application/vnd.ms-excel; name=transactions.xls

It might not seem like much, but the content-type check for this workaround in get_filename appears to be case-sensitive, and thus misses the upper case "T" in Content-Type.
There is no information provided in the headers about the mailer they use, but they get plenty of other bits wrong too (eg. the xls file is actually just tab-separated values).
msg114977 - (view) Author: Ich Neumon (ichneumon) Date: 2010-08-26 14:01
A slight update for my workaround for the above with the following code:

if not filename:
   ct = part.get("Content-Type")
   if ct:
      m = re.compile('name=\"?(\S+)\"?').search(ct, 1)
      if m: filename = m.group(1)

I've added ? operators to the double-quotes, and changed the case in the part.get. However, there may be good reasons as to why part.get needs to be case-sensitive. Not my area of expertise though.
msg114981 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-26 15:09
Ich, if your problem still exists in 2.7, 3.1, or 3.2, please open a new issue with a test case showing the problem you are running in to.
msg114982 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-26 15:12
Also, issue 7082 might be relevant here, since it fixed a bug in this fix.
History
Date User Action Args
2022-04-11 14:56:14adminsetgithub: 42789
2010-08-26 15:12:42r.david.murraysetmessages: + msg114982
2010-08-26 15:09:26r.david.murraysetnosy: + r.david.murray
messages: + msg114981
2010-08-26 14:01:22ichneumonsetmessages: + msg114977
2010-08-26 13:37:24ichneumonsetnosy: + ichneumon
messages: + msg114974
2006-01-11 21:47:23mpasniewcreate