This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: quoted printable parse the sequence '= ' incorrectly
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: georg.brandl, tungwaiyip
Priority: normal Keywords:

Created on 2006-10-31 21:06 by tungwaiyip, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (3)
msg30416 - (view) Author: Wai Yip Tung (tungwaiyip) Date: 2006-10-31 21:06
>>> import quopri

>>> s = 'I say= a secret message\r\nThank you'

>>> quopri.a2b_qp
<built-in function a2b_qp>
>>> quopri.decodestring(s)  # use the c version 
binascii.a2b_qp() to decode
'I sayThank you'

>>> quopri.a2b_qp=None
>>> quopri.decodestring(s)  # use the python version 
quopri.decode() to decode
'I say= a secret message\nThank you'


Note that the sequence '= ' is invalid according to 
RFC 2045 section 6.7:

-------------------------------------------------------
An "=" followed by a character that is neither a 
hexadecimal digit (including "abcdef") nor the CR 
character of a CRLF pair is illegal ... A reasonable 
approach by a robust implementation might be to 
include the "=" character and the following character 
in the decoded data without any transformation
-------------------------------------------------------

The lenient interpretation is used by the Python 
version parser quopri.decode() to produce the second 
string. Most email clients use a similar lenient 
interpretation.

The C version parser binascii.a2b_qp(), which is used 
in preference to the Python verison, produce a 
surprising result with the string 'a secret message' 
omitted.

This may create an opportunity for spammers to insert 
secret message after '= ' so that it is not visible to 
Python based spam filter but woiuld display in non-
Python based email client.
msg30417 - (view) Author: Wai Yip Tung (tungwaiyip) Date: 2006-10-31 21:18
Logged In: YES 
user_id=561546

The problem may come from binascii_a2b_qp() in binascii.c. It 
considers the '= ' or '=\t' sequence as a soft line break. Such 
interpretation appears to have no basis. It could be an 
misinterpretation of RFC 2045:

-------------------------------------------------------------------
In particular, an "=" at the end of an encoded line, indicating a 
soft line break (see rule #5) may follow one or more TAB (HT) or 
SPACE characters.
-------------------------------------------------------------------

This passage reminds readers they might find TAB or SPACE before 
an "=", but not after it. "= " is plain illegal as far as I know.
msg30418 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-11-16 17:09
Thanks for the report, this is now fixed in rev. 52765, 52766 (2.5).
History
Date User Action Args
2022-04-11 14:56:21adminsetgithub: 44182
2006-10-31 21:06:58tungwaiyipcreate