This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: invalid \U escape gives 0=length unistr
Type: Stage:
Components: Unicode Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: lemburg Nosy List: anthonybaxter, jepler, jhylton, lemburg
Priority: normal Keywords:

Created on 2003-10-03 13:30 by jepler, last changed 2022-04-10 16:11 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
unicode-8f.patch jepler, 2003-10-03 13:30 patch & test enhancements
Messages (3)
msg18528 - (view) Author: Jeff Epler (jepler) Date: 2003-10-03 13:30
>>> u'\Ufffffffe' # CORRECT
UnicodeDecodeError: 'unicodeescape' codec can't decode
bytes in position 0-9: illegal Unicode character
>>> u'\Uffffffff'  # WRONG
u''
>>> len(_)
0

Observed on 2.2.2 (redhat wide-unicode build,
sys.maxunicode=1114111), 2.3.1 (custom build,
sys.maxunicode == 65535)

I think the problem is due to this logic in
unicodeobject.c:PyUnicode_DecodeUnicodeEscape()
            if (chr == 0xffffffff)
                /* _decoding_error will have already
written into the
                   target buffer. */
                break;
perhaps it should be (chr == 0xffffffff &&
PyErr_Occurred()) 

I tried this change locally, and it fixes the problem:
>>> u'\Uffffffff'
UnicodeDecodeError: 'unicodeescape' codec can't decode
bytes in position 0-9: illegal Unicode character
>>> u'\Ufffffffe'
UnicodeDecodeError: 'unicodeescape' codec can't decode
bytes in position 0-9: illegal Unicode character
and doesn't change the outcome of the test suite.

Patch against 2.3.1 attached.
msg18529 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2003-10-06 05:08
Logged In: YES 
user_id=31392

Fixed in rev. 2.199 of unicodeobject.c.
msg18530 - (view) Author: Anthony Baxter (anthonybaxter) (Python triager) Date: 2003-10-06 05:42
Logged In: YES 
user_id=29957

This should probably be fixed in release23-maint as well, no?
History
Date User Action Args
2022-04-10 16:11:35adminsetgithub: 39360
2003-10-03 13:30:19jeplercreate