This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: email.Header misparses mixed headers
Type: Stage:
Components: Library (Lib) Versions: Python 2.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: barry Nosy List: barry, iko, kalinda
Priority: normal Keywords:

Created on 2002-11-18 14:33 by iko, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
Header.diff iko, 2002-11-18 14:33 Patch for Header.py to fix header decoding issues
Messages (6)
msg13351 - (view) Author: Anders Hammarquist (iko) Date: 2002-11-18 14:33
email.Header.decode_header() misparses headers with
both encoded an unencoded words. This example from RFC2047

=?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD@vm1.ulg.ac.be>

gets parsed as

AndréPirard <PIRARD@vm1.ulg.ac.be>

where there should obviously be a space between André
and Pirard. RFC2047 says to ignore spaces between
encoded words (but not between encoded and unencoded
words, though it doesn't explicitly say so from what I
could find, and obviously not between unencoded words).

Also, I see it's trying to handle continuation lines,
but it only does it if there are encoded words in the
continuation line. It barfs badly on this test case:

'Re: =?mac-iceland?q?r=8Aksm=9Arg=8Cs?= baz\n foo bar
=?mac-iceland?q?r=8Aksm=9Arg=8Cs?='

I think I'll just do a patch...

/Anders

P.S. It seems at least remotely related to Bug#552957
msg13352 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2003-03-06 06:50
Logged In: YES 
user_id=12800

The first bug above has already been fixed in email 2.5
(python 2.3 cvs).  The second pointed to a real bug, now
fixed I believe.
msg13353 - (view) Author: Anders Hammarquist (iko) Date: 2003-03-06 14:15
Logged In: YES 
user_id=14

The first bug is still there... With version 1.19 from CVS I
get this with my example:

>>> print
unicode(Header.make_header(Header.decode_header('=?ISO-8859-1?Q?Andr=E9?=
Pirard <PIRARD@vm1.ulg.ac.be>'))).encode('latin-1')
AndréPirard <PIRARD@vm1.ulg.ac.be>

(The problem is that whitespaces get stripped of on line 91:
unenc = parts.pop(0).strip()
before we know whether they are significant or not.

The continuation line bug seems to be fixed however.

/Anders
msg13354 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2003-03-06 16:21
Logged In: YES 
user_id=12800

Try current cvs.
msg13355 - (view) Author: Anders Hammarquist (iko) Date: 2003-03-06 16:43
Logged In: YES 
user_id=14

Looks OK.
msg13356 - (view) Author: jonny reichwald (kalinda) Date: 2005-04-27 12:23
Logged In: YES 
user_id=661399

I am using python 2.4 and still have this problem. To be
more exact, line 73 in Header.py still strips the parts.
Is there a reason for this not being fixed?
History
Date User Action Args
2022-04-10 16:05:54adminsetgithub: 37495
2002-11-18 14:33:32ikocreate