Description:
collapse_rfc2231_value does not handle UnicodeError
exception. Therefore a header like this one can cause
UnicodeError in attempting unicode conversion.
---
Content-Type: text/plain; charset="ISO-2022-JP"
Content-Disposition: attachment;
filename*=iso-2022-jp''%1B%24BJs9p%3Dq%2D%21%1B%28B%2Etxt
---
Test script:
---
#! /usr/bin/env python
import sys
import email
msg = email.message_from_file(sys.stdin)
for part in msg.walk():
print part.get_params()
print part.get_filename()
---
run
% env LANG=ja_JP.eucJP ./test.py < attached_sample.eml
Background:
Character 0x2d21 is invalid in JIS X0208 but defined in
CP932 (Shift_JIS's superset by Microsoft). Conversion
between Shift_JIS and ISO-2022-JP are computable
because both of them based on JIS X0208. So sometimes
CP932 characters appear in ISO-2022-JP encoded string,
typically produced by Windows MUA.
But Python's "ISO-2022-JP" means *pure* JIS X0208, thus
conversion is failed.
Workaround:
Convert to fallback_charset and/or skip invalid character.
|