This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: email: mishandles Content-Transfer-Enc. for text/* messages
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: barry Nosy List: barry, dgibson, doko
Priority: normal Keywords:

Created on 2004-05-09 10:19 by doko, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (3)
msg20745 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2004-05-09 10:19
[forwarded from http://bugs.debian.org/247792]

Normally when creating MIME attachements with the email
module, the user must explicitly encode (base64 or
quoted-printable) the message using encode_base64() or
encode_quopri(), for example.  That sets the
Content-Transfer-Encoding header, and replaces the
payload with an encoded version which is then emitted
verbatim by the Generator. 
 
However, for text/ objects, the
Content-Transfer-Encoding header is set when the
payload is attached, according to the character set. 
The payload is not encoded at that point, but is
instead encoded according to the charset's default body
encoding by the Generator at flatten() time.  This
means in particular that using encode_*() on a text/*
message with a non-None charset will result in a
malformed message: it will have multiple
Content-Transfer-Encoding headers, and multiply encoded
body. For added confusion the multiple body encodings
won't even be applied in the same order as the
duplicate headers appear (as they would if multiple
encode_*() functions were applied in sequence). 
 
It also means it is impossible to override a charset's
default encoding. For example utf-8 text will always be
base64 encoded, even though utf-8 text in western
languages is likely to be almost entirely 7bit, so
quoted-printable or even 8bit encoding would be more
appropriate. 
 
The payload data of the Message object should
consistently hold either encoded or unencoded data.  If
the latter, the Generator should take its cue from the
Content-Transfer-Encoding header, not from the charset. If 
the former, the encode_* functions should recognize an
already encoded message and recode it, or at least
throw an exception rather than generating a malformed
MIME message. 
msg20746 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2004-10-09 21:50
Logged In: YES 
user_id=12800

In Python 2.3, the _encoder argument to MIMEText.__init__()
is deprecated, and in Python 2.4 it is removed.  The right
way to encode the payload for a text/* message is to provide
the charset, not to provide an _encoder.
msg20747 - (view) Author: David Gibson (dgibson) Date: 2004-10-11 07:11
Logged In: YES 
user_id=30682

This reply seems to miss the point.

Having the semantics of the payload string (encoded at
insertion vs. encoded at flattening) be different for text/
and non text/ messages is conceptually broken and confusing.

Not being able to override a charset's default encoder is
also a genuine deficiency.  It is not sane to send out
english, 7-bit messages which happen to be UTF-8 encoded as
ungreppable BASE64 blobs.

I really don't think this is hard to fix properly.  I'll try
to get a patch together, to demonstrate.
History
Date User Action Args
2022-04-11 14:56:04adminsetgithub: 40237
2004-05-09 10:19:11dokocreate