This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: email.Message.set_payload followed by bad result get_payload
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: barry Nosy List: barry, fresh, msapiro
Priority: normal Keywords:

Created on 2006-01-18 22:09 by msapiro, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
example.py msapiro, 2006-01-18 22:09 example script which illustrates the problem
Message.py.patch.txt msapiro, 2006-01-20 23:19 Hint at possible fix
1409455.txt barry, 2006-02-06 03:42
Messages (6)
msg27308 - (view) Author: Mark Sapiro (msapiro) * (Python triager) Date: 2006-01-18 22:09
Under certain circumstances, in particular when charset
is 'iso-8859-1', where msg is an email.Message() instance,

    msg.set_payload(text, charset)

'apparently' encodes the text as quoted-printable and
adds a

Content-Transfer-Encoding: quoted-printable

header to msg. I say 'apparently' because if one prints
msg or creates a Generator instance and writes msg to a
file, the message is printed/written as a correct,
quoted-printable encoded message, but

    text = msg._payload
or

    text = msg.get_payload()

gives the original text, not quoted-printable encoded, and

    text = msg.get_payload(decode=True)

gives a quoted-printable decoding of the original text
which is munged if the original text included '=' in
some ways.

This is causing problems in Mailman which are currently
worked around by flagging if the payload was set by
set_payload() and not subsequently 'decoding' in that
case, but it would be better if
set_payload()/get_payload() worked properly.

A script is attached which illustrates the problem.
msg27309 - (view) Author: Mark Sapiro (msapiro) * (Python triager) Date: 2006-01-20 23:19
Logged In: YES 
user_id=1123998

I've looked at the email library and I see the problem.
msg.set_payload() never QP encodes msg._payload. When the
message is stringified or flattened by a generator, the
generator's _handle_text() method does the encoding and it
is msg._charset that signals the need to do this. Thus when
the message object is ultimately converted to a suitable
external form, the body is QP encoded, but internally it
never is. Thus, subsequent msg.get_payload() calls return
unexpected results.

It appears (from minimal testing) that when a text message
is parsed into an email.Message.Message instance, _charset
is None even if there is a character set specification in a
Content-Type: header.

I have attached a patch (Message.py.patch.txt) which may fix
the problem. It has only been tested against the already
attached example.py so it is really untested. Also, it only
addresses the quoted-printable case. I haven't even thought
about whether there might be a similar problem involving base64.
msg27310 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2006-02-06 03:42
Logged In: YES 
user_id=12800

See the attached patch for what I think is ultimately the
right fix.  The idea is that when set_payload() is called,
the payload is immediately encoded so that get_payload()
will do the right thing.  Also, Generator.py has to be fixed
to not doubly encode the payload.

Run against your example, it seems to DTRT.  It also passes
all but one of the email pkg unit tests.  The one failure
is, I believe due to an incorrect test.  The patch includes
a fix for that as well as adding a test for
get_payload(decode=True).

I'd like to get some feedback from the email-sig before
applying this, but it seems right to me.
msg27311 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2006-02-08 03:07
Logged In: YES 
user_id=12800

See the latest patch in issue 1409458:

https://sourceforge.net/support/tracker.php?aid=1409538
msg27312 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2006-02-08 13:34
Logged In: YES 
user_id=12800

r42270 for Python 2.3/email 2.5.  I will forward port these
to Python 2.4 and 2.5 (email 3.0).
msg27313 - (view) Author: Chris Withers (fresh) Date: 2006-09-10 12:26
Logged In: YES 
user_id=24723

This fix seems to have caused issues for code that does the
following:

from email.Charset import Charset,QP
from email.MIMEText import MIMEText

charset = Charset('utf-8')
charset.body_encoding = QP

msg = MIMEText(
    u'Some text with chars that need encoding: \xa3',
    'plain',
    )
# set the charset 
msg.set_charset(charset)

print msg.as_string()

Before this fix, the above would result in:

MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="utf-8"

Some text with chars that need encoding: =A3

Now I get:

Traceback (most recent call last):
  File "test_encoding.py", line 14, in ?
    msg.as_string()
  File "c:\python24\lib\email\Message.py", line 129, in
as_string
    g.flatten(self, unixfrom=unixfrom)
  File "c:\python24\lib\email\Generator.py", line 82, in flatten
    self._write(msg)
  File "c:\python24\lib\email\Generator.py", line 113, in _write
    self._dispatch(msg)
  File "c:\python24\lib\email\Generator.py", line 139, in
_dispatch
    meth(msg)
  File "c:\python24\lib\email\Generator.py", line 182, in
_handle_text
    self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode character
u'\xa3' in position 41:
 ordinal not in range(128)

Am I doing something wrong here or is this patch in error?
History
Date User Action Args
2022-04-11 14:56:15adminsetgithub: 42807
2006-01-18 22:09:59msapirocreate