Issue 950747: email: mishandles Content-Transfer-Enc. for text/* messages

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/40237

classification

Title:	email: mishandles Content-Transfer-Enc. for text/* messages
Type:		Stage:
Components:	Library (Lib)	Versions:	Python 2.3

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:	barry	Nosy List:	barry, dgibson, doko
Priority:	normal	Keywords:

Created on 2004-05-09 10:19 by doko, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (3)
msg20745 - (view)	Author: Matthias Klose (doko) *	Date: 2004-05-09 10:19
[forwarded from http://bugs.debian.org/247792] Normally when creating MIME attachements with the email module, the user must explicitly encode (base64 or quoted-printable) the message using encode_base64() or encode_quopri(), for example. That sets the Content-Transfer-Encoding header, and replaces the payload with an encoded version which is then emitted verbatim by the Generator. However, for text/ objects, the Content-Transfer-Encoding header is set when the payload is attached, according to the character set. The payload is not encoded at that point, but is instead encoded according to the charset's default body encoding by the Generator at flatten() time. This means in particular that using encode_() on a text/ message with a non-None charset will result in a malformed message: it will have multiple Content-Transfer-Encoding headers, and multiply encoded body. For added confusion the multiple body encodings won't even be applied in the same order as the duplicate headers appear (as they would if multiple encode_() functions were applied in sequence). It also means it is impossible to override a charset's default encoding. For example utf-8 text will always be base64 encoded, even though utf-8 text in western languages is likely to be almost entirely 7bit, so quoted-printable or even 8bit encoding would be more appropriate. The payload data of the Message object should consistently hold either encoded or unencoded data. If the latter, the Generator should take its cue from the Content-Transfer-Encoding header, not from the charset. If the former, the encode_ functions should recognize an already encoded message and recode it, or at least throw an exception rather than generating a malformed MIME message.
msg20746 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2004-10-09 21:50
Logged In: YES user_id=12800 In Python 2.3, the _encoder argument to MIMEText.__init__() is deprecated, and in Python 2.4 it is removed. The right way to encode the payload for a text/* message is to provide the charset, not to provide an _encoder.
msg20747 - (view)	Author: David Gibson (dgibson)	Date: 2004-10-11 07:11
Logged In: YES user_id=30682 This reply seems to miss the point. Having the semantics of the payload string (encoded at insertion vs. encoded at flattening) be different for text/ and non text/ messages is conceptually broken and confusing. Not being able to override a charset's default encoder is also a genuine deficiency. It is not sane to send out english, 7-bit messages which happen to be UTF-8 encoded as ungreppable BASE64 blobs. I really don't think this is hard to fix properly. I'll try to get a patch together, to demonstrate.

History
Date	User	Action	Args
2022-04-11 14:56:04	admin	set	github: 40237
2004-05-09 10:19:11	doko	create