This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Unicode xmlcharrefreplace produces backslash not xml style.
Type: Stage:
Components: None Versions: Python 2.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: doerwalter, odontomatix
Priority: normal Keywords:

Created on 2007-03-05 17:11 by odontomatix, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (2)
msg31431 - (view) Author: Stanley Sokolow (odontomatix) Date: 2007-03-05 17:11
In Python 2.4.2 and 2.5, and maybe other versions too, the unicode string encoder for producing xml style unicode output (example, © for copyright symbol) produces the wrong style -- it produces backslash encoding (\xa9 for same copyright unicode character).

Example at Python shell:
u'\u2122'.encode('unicode_escape','xmlcharrefreplace')
should produce: ™
but it produces \u2122

The same happens when it is used in a program.  The print output of the encoded unicode contains backslash encodings as though the method 'backslashreplace' had been used.

msg31432 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2007-03-05 17:54
u'\u2122'.encode('unicode_escape','xmlcharrefreplace') produces \u2122 because that's the way the unicode_escape codec outputs unicode codepoints. For unicode_escape the xmlcharrefreplace error handler never kicks in. If you want the error handler to kick in, you have to use an encoding that doesn't support the character you want to encode. The best candidate is probably ascii:

>>> u"\u2122".encode("ascii", "xmlcharrefreplace")
>>> '"'
History
Date User Action Args
2022-04-11 14:56:22adminsetgithub: 44661
2007-03-05 17:11:09odontomatixcreate