This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: XMLGenerator ignores encoding in output
Type: Stage:
Components: XML Versions: Python 2.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: loewis Nosy List: loewis, mlh
Priority: high Keywords:

Created on 2004-04-19 17:18 by mlh, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (3)
msg20545 - (view) Author: Magnus Lie Hetland (mlh) Date: 2004-04-19 17:18
When XMLGenerator is supplied with an encoding such as
'utf-8' and subsequently with some non-ASCII Unicode
characters, it crashes, because of its characters()
method. The current version is:

def characters(self, content):
    self._out.write(escape(content))

This completely ignores the encoding, and will (when
writing to something such as a StringIO or the like)
simply try to convert this into an ASCII string. The
encoding is only used in the XML header, not as the
real encoding!

It may be that I've gotten things wrong, but I would
suggest the following fix:

def characters(self, content):
    self._out.write(escape(content).encode(self._encoding))

This seems to work well for me, at least.
msg20546 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-04-20 19:46
Logged In: YES 
user_id=21627

In general, it would be even better to generate character
references for characters not representable in the output
encoding.
msg20547 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-05-06 02:40
Logged In: YES 
user_id=21627

Thanks for pointing this out. Fixed in

saxutils.py 1.21.10.2 1.23
NEWS 1.831.4.105

Fix in PyXML is pending.
History
Date User Action Args
2022-04-11 14:56:03adminsetgithub: 40169
2004-04-19 17:18:47mlhcreate