This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: email.Utils.py: UnicodeError in RFC2322 header
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: barry Nosy List: barry, qbin
Priority: normal Keywords:

Created on 2006-01-24 20:19 by qbin, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
conversion_error.eml qbin, 2006-01-24 20:19 sample email
Messages (2)
msg27352 - (view) Author: A. Sagawa (qbin) Date: 2006-01-24 20:19
Description:
collapse_rfc2231_value does not handle UnicodeError
exception. Therefore a header like this one can cause
UnicodeError in attempting unicode conversion.

---
Content-Type: text/plain; charset="ISO-2022-JP"
Content-Disposition: attachment;
 filename*=iso-2022-jp''%1B%24BJs9p%3Dq%2D%21%1B%28B%2Etxt
---

Test script:
---
#! /usr/bin/env python
import sys
import email

msg = email.message_from_file(sys.stdin)
for part in msg.walk():
  print part.get_params()
  print part.get_filename()
---
run
% env LANG=ja_JP.eucJP ./test.py < attached_sample.eml

Background:
Character 0x2d21 is invalid in JIS X0208 but defined in
CP932 (Shift_JIS's superset by Microsoft).  Conversion
between Shift_JIS and ISO-2022-JP are computable
because both of them based on JIS X0208. So sometimes
CP932 characters appear in ISO-2022-JP encoded string,
typically produced by Windows MUA.
But Python's "ISO-2022-JP" means *pure* JIS X0208, thus
conversion is failed.

Workaround:
Convert to fallback_charset and/or skip invalid character.
msg27353 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2006-07-28 03:19
Logged In: YES 
user_id=12800

r50894 for Python 2.4/email 3.0.  This is already fixed in
Python 2.5/email 4.0
History
Date User Action Args
2022-04-11 14:56:15adminsetgithub: 42834
2006-01-24 20:19:55qbincreate