This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Lone surrogates cause bad .pyc files
Type: Stage:
Components: Unicode Versions: Python 2.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: lemburg Nosy List: gvanrossum, lemburg
Priority: release blocker Keywords:

Created on 2002-09-17 20:47 by gvanrossum, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Messages (4)
msg12439 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-09-17 20:47
A Unicode literal in a .py file containing a lone
surrogate will cause a .pyc file to be written that
causes an exception in the UTF-8 decoder when it is loaded.

This is fixed in 2.3 but a fix is needed for 2.2 that
doesn't require the magic number to be changed.

A solution appears to be a UTF-8 decoder that accepts
the correct *and* the malformed version for such
Unicode strings.

(See python-dev discussion, subject "utf8 issue" in
Aug/Sep 2002.)
msg12440 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-09-23 16:21
Logged In: YES 
user_id=6380

This needs to be fixed in 2.2.2.
msg12441 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2002-09-24 10:28
Logged In: YES 
user_id=38388

Working on it...
msg12442 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2002-09-24 14:07
Logged In: YES 
user_id=38388

Fixed in the 2.2 maintenance branch.
History
Date User Action Args
2022-04-10 16:05:41adminsetgithub: 37192
2002-09-17 20:47:54gvanrossumcreate