Issue 640110: email.Header misparses mixed headers

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/37495

classification

Title:	email.Header misparses mixed headers
Type:		Stage:
Components:	Library (Lib)	Versions:	Python 2.2

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	barry	Nosy List:	barry, iko, kalinda
Priority:	normal	Keywords:

Created on 2002-11-18 14:33 by iko, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
Header.diff	iko, 2002-11-18 14:33	Patch for Header.py to fix header decoding issues

Messages (6)
msg13351 - (view)	Author: Anders Hammarquist (iko)	Date: 2002-11-18 14:33
email.Header.decode_header() misparses headers with both encoded an unencoded words. This example from RFC2047 =?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD@vm1.ulg.ac.be> gets parsed as AndréPirard <PIRARD@vm1.ulg.ac.be> where there should obviously be a space between André and Pirard. RFC2047 says to ignore spaces between encoded words (but not between encoded and unencoded words, though it doesn't explicitly say so from what I could find, and obviously not between unencoded words). Also, I see it's trying to handle continuation lines, but it only does it if there are encoded words in the continuation line. It barfs badly on this test case: 'Re: =?mac-iceland?q?r=8Aksm=9Arg=8Cs?= baz\n foo bar =?mac-iceland?q?r=8Aksm=9Arg=8Cs?=' I think I'll just do a patch... /Anders P.S. It seems at least remotely related to Bug#552957
msg13352 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2003-03-06 06:50
Logged In: YES user_id=12800 The first bug above has already been fixed in email 2.5 (python 2.3 cvs). The second pointed to a real bug, now fixed I believe.
msg13353 - (view)	Author: Anders Hammarquist (iko)	Date: 2003-03-06 14:15
Logged In: YES user_id=14 The first bug is still there... With version 1.19 from CVS I get this with my example: >>> print unicode(Header.make_header(Header.decode_header('=?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD@vm1.ulg.ac.be>'))).encode('latin-1') AndréPirard <PIRARD@vm1.ulg.ac.be> (The problem is that whitespaces get stripped of on line 91: unenc = parts.pop(0).strip() before we know whether they are significant or not. The continuation line bug seems to be fixed however. /Anders
msg13354 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2003-03-06 16:21
Logged In: YES user_id=12800 Try current cvs.
msg13355 - (view)	Author: Anders Hammarquist (iko)	Date: 2003-03-06 16:43
Logged In: YES user_id=14 Looks OK.
msg13356 - (view)	Author: jonny reichwald (kalinda)	Date: 2005-04-27 12:23
Logged In: YES user_id=661399 I am using python 2.4 and still have this problem. To be more exact, line 73 in Header.py still strips the parts. Is there a reason for this not being fixed?

History
Date	User	Action	Args
2022-04-10 16:05:54	admin	set	github: 37495
2002-11-18 14:33:32	iko	create