Issue554916
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2002-05-11 16:25 by mwh, last changed 2022-04-10 16:05 by admin. This issue is now closed.
Messages (8) | |||
---|---|---|---|
msg10730 - (view) | Author: Michael Hudson (mwh) | Date: 2002-05-11 16:25 | |
Assigned somewhat arbitrarily. It's a roundtrip test, I think. |
|||
msg10731 - (view) | Author: Walter Dörwald (doerwalter) * | Date: 2002-05-13 13:38 | |
Logged In: YES user_id=89016 The minimal failing testcase is: >>> unicode(u"\udb00\udc00".encode("utf-8"), "utf-8") == u"\udb00\udc00" False which is strange, because they *seem* to be the same: u"\udb00\udc00" u'\U000d0000' >>> unicode(u"\udb00\udc00".encode("utf-8"), "utf-8") u'\U000d0000' |
|||
msg10732 - (view) | Author: Michael Hudson (mwh) | Date: 2002-05-13 13:58 | |
Logged In: YES user_id=6656 >>> a = u"\udb00\udc00" [20811 refs] >>> b = unicode(a.encode("utf-8"), "utf-8") [21061 refs] >>> a, b (u'\U000d0000', u'\U000d0000') [21063 refs] >>> len(a), len(b) (2, 1) [21063 refs] Erm...? |
|||
msg10733 - (view) | Author: Michael Hudson (mwh) | Date: 2002-05-13 14:06 | |
Logged In: YES user_id=6656 Even better: $ ./python Adding parser accelerators ... Done. Python 2.2.1 (#1, May 13 2002, 15:02:01) [GCC 2.96 20000731 (Red Hat Linux 7.1 2.96-98)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> unicode(u"\udb00\udc00".encode("utf-8"), "utf-8") == u"\udb00\udc00" 0 [18762 refs] but the test passes. And there was me thinking that it wasn't a problem on the release22-maint branch. |
|||
msg10734 - (view) | Author: Michael Hudson (mwh) | Date: 2002-10-09 12:57 | |
Logged In: YES user_id=6656 Hmm. The test has stopped failing, so maybe we can close this. I'd be happier if I knew why, though. |
|||
msg10735 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2002-10-10 15:30 | |
Logged In: YES user_id=38388 I'm not exactly sure why things work again, but I do know that I looked into this some time ago. Perhaps I simply forgot to close the bug or one of the UTF-8 codec overhauls remedied the problem. Here's what I get with python 2.3 UCS4: >>> len(u'\U000d0000') 1 >>> len(u"\udb00\udc00") 2 >>> u'\U000d0000' == u"\udb00\udc00" False >>> len(unicode(u"\udb00\udc00".encode('utf-8'), 'utf-8')) 1 >>> len(unicode(u'\U000d0000'.encode('utf-8'), 'utf-8')) 1 This is what I get with Python 2.2.1: >>> len(u'\U000d0000') 2 >>> len(u"\udb00\udc00") 2 >>> u'\U000d0000' == u"\udb00\udc00" 1 >>> len(unicode(u"\udb00\udc00".encode('utf-8'), 'utf-8')) 2 >>> len(unicode(u'\U000d0000'.encode('utf-8'), 'utf-8')) 2 There's still a difference there, but the UTF-8 codec behaves consistently. |
|||
msg10736 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2003-01-19 23:02 | |
Logged In: YES user_id=38388 Michael, is the test still failing or can I close this ? |
|||
msg10737 - (view) | Author: Michael Hudson (mwh) | Date: 2003-01-20 10:12 | |
Logged In: YES user_id=6656 Let's get rid of it. I still don't understand what happened, but we can worry about that if it resurfaces. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:05:18 | admin | set | github: 36592 |
2002-05-11 16:25:58 | mwh | create |