Issue 554916: test_unicode fails in wide unicode build

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/36592

classification

Title:	test_unicode fails in wide unicode build
Type:		Stage:
Components:	Unicode	Versions:	Python 2.3

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	lemburg	Nosy List:	doerwalter, lemburg, mwh
Priority:	normal	Keywords:

Created on 2002-05-11 16:25 by mwh, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Messages (8)
msg10730 - (view)	Author: Michael Hudson (mwh)	Date: 2002-05-11 16:25
Assigned somewhat arbitrarily. It's a roundtrip test, I think.
msg10731 - (view)	Author: Walter Dörwald (doerwalter) *	Date: 2002-05-13 13:38
Logged In: YES user_id=89016 The minimal failing testcase is: >>> unicode(u"\udb00\udc00".encode("utf-8"), "utf-8") == u"\udb00\udc00" False which is strange, because they seem to be the same: u"\udb00\udc00" u'\U000d0000' >>> unicode(u"\udb00\udc00".encode("utf-8"), "utf-8") u'\U000d0000'
msg10732 - (view)	Author: Michael Hudson (mwh)	Date: 2002-05-13 13:58
Logged In: YES user_id=6656 >>> a = u"\udb00\udc00" [20811 refs] >>> b = unicode(a.encode("utf-8"), "utf-8") [21061 refs] >>> a, b (u'\U000d0000', u'\U000d0000') [21063 refs] >>> len(a), len(b) (2, 1) [21063 refs] Erm...?
msg10733 - (view)	Author: Michael Hudson (mwh)	Date: 2002-05-13 14:06
Logged In: YES user_id=6656 Even better: $ ./python Adding parser accelerators ... Done. Python 2.2.1 (#1, May 13 2002, 15:02:01) [GCC 2.96 20000731 (Red Hat Linux 7.1 2.96-98)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> unicode(u"\udb00\udc00".encode("utf-8"), "utf-8") == u"\udb00\udc00" 0 [18762 refs] but the test passes. And there was me thinking that it wasn't a problem on the release22-maint branch.
msg10734 - (view)	Author: Michael Hudson (mwh)	Date: 2002-10-09 12:57
Logged In: YES user_id=6656 Hmm. The test has stopped failing, so maybe we can close this. I'd be happier if I knew why, though.
msg10735 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2002-10-10 15:30
Logged In: YES user_id=38388 I'm not exactly sure why things work again, but I do know that I looked into this some time ago. Perhaps I simply forgot to close the bug or one of the UTF-8 codec overhauls remedied the problem. Here's what I get with python 2.3 UCS4: >>> len(u'\U000d0000') 1 >>> len(u"\udb00\udc00") 2 >>> u'\U000d0000' == u"\udb00\udc00" False >>> len(unicode(u"\udb00\udc00".encode('utf-8'), 'utf-8')) 1 >>> len(unicode(u'\U000d0000'.encode('utf-8'), 'utf-8')) 1 This is what I get with Python 2.2.1: >>> len(u'\U000d0000') 2 >>> len(u"\udb00\udc00") 2 >>> u'\U000d0000' == u"\udb00\udc00" 1 >>> len(unicode(u"\udb00\udc00".encode('utf-8'), 'utf-8')) 2 >>> len(unicode(u'\U000d0000'.encode('utf-8'), 'utf-8')) 2 There's still a difference there, but the UTF-8 codec behaves consistently.
msg10736 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2003-01-19 23:02
Logged In: YES user_id=38388 Michael, is the test still failing or can I close this ?
msg10737 - (view)	Author: Michael Hudson (mwh)	Date: 2003-01-20 10:12
Logged In: YES user_id=6656 Let's get rid of it. I still don't understand what happened, but we can worry about that if it resurfaces.

History
Date	User	Action	Args
2022-04-10 16:05:18	admin	set	github: 36592
2002-05-11 16:25:58	mwh	create