Issue 1314107: Issue in unicode args in logging

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/42451

classification

Title:	Issue in unicode args in logging
Type:		Stage:
Components:	Unicode	Versions:	Python 2.4

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	vinay.sajip	Nosy List:	lemburg, nnorwitz, tungwaiyip, vinay.sajip
Priority:	normal	Keywords:

Created on 2005-10-05 18:11 by tungwaiyip, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)
msg26512 - (view)	Author: Wai Yip Tung (tungwaiyip)	Date: 2005-10-05 18:11
logging has an issue in handling unicode object arguments. >>> import logging >>> >>> class Obj: ... def __init__(self,name): ... self.name = name ... def __str__(self): ... return self.name ... >>> # a non-ascii string ... >>> obj = Obj(u'\u00f6') >>> >>> # this will cause error ... >>> print '%s' % obj Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0: ordinal not in range(128) >>> >>> # this will promote to unicode (and the console also happen to be able to display it) ... >>> print u'%s' % obj ö >>> >>> # this works fine ... # (other than logging makes its own decision to encode in utf8) ... >>> logging.error(u'%s' % obj) ERROR:root:b >>> >>> # THIS IS AN UNEXPECTED PROBLEM!!! ... >>> logging.error(u'%s', obj) Traceback (most recent call last): File "C:\Python24\lib\logging\__init__.py", line 706, in emit msg = self.format(record) File "C:\Python24\lib\logging\__init__.py", line 592, in format return fmt.format(record) File "C:\Python24\lib\logging\__init__.py", line 382, in format record.message = record.getMessage() File "C:\Python24\lib\logging\__init__.py", line 253, in getMessage msg = msg % self.args UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0: ordinal not in range(128) >>> >>> # workaround the str() conversion in getMessage() ... >>> logging.error(u'%s-\u00f6', obj) ERROR:root:b-b The issue seems to be in LogRecord.getMessage(). It attempts to convert msg to byte string: msg = str(self.msg) I am not sure why ti want to do the conversion. The last example workaround this by making sure msg is not convertible to byte string.
msg26513 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2005-10-05 20:47
Logged In: YES user_id=38388 Unassinging the bug. I don't know anything about the logging module. Hint: perhaps the logging module should grow an .encoding attribute which then allows converting Unicode to some encoding used in the log file ?!
msg26514 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2005-10-06 04:00
Logged In: YES user_id=33168 Vinay, any suggestions?
msg26515 - (view)	Author: Vinay Sajip (vinay.sajip) *	Date: 2005-10-06 08:44
Logged In: YES user_id=308438 Misc. changes were backported into Python 2.4.2, please check that you have this version. The problem is not with msg = str(self.msg) but rather with msg = msg % args To ensure good Unicode support, ensure your messages are either Unicode strings or objects whose __str__() method returns a Unicode string. Then, msg = msg % args should result in a Unicode object. You can pass this to a FileHandler opened with an encoding argument, or a StreamHandler whose stream has been opened using codecs.open(). Ensure your default encoding is set correctly using sitecustomize.py. The encoding additions were made in Revision 1.26 of logging/__init__.py, dated 13/03/2005. Marking as closed.
msg26516 - (view)	Author: Wai Yip Tung (tungwaiyip)	Date: 2005-10-06 23:16
Logged In: YES user_id=561546 >>To ensure good Unicode support, ensure your messages are either Unicode strings or objects whose __str__() method returns a Unicode string. Then, >>msg = msg % args That's what I am doing already. Let me explain the subtle problem again. 1. print '%s' % obj - error 2. logging.error(u'%s' % obj) - ok 3. logging.error(u'%s', obj) - error 4. logging.error(u'%s-\u00f6', obj) -ok I can understand how 1 fails. But I expect 2,3 and 4 to work similarly. Especially contrast 3 with 4. 4 work when 3 doesn't because when str() is applied to u'%s-\u00f6' it fails and it fallbacks to the original unicode string, which is the correct way in my opinion. Whereas in 3, the u'%s' get demoted to byte string '%s' so it fails like 1.
msg26517 - (view)	Author: Vinay Sajip (vinay.sajip) *	Date: 2005-10-07 08:43
Logged In: YES user_id=308438 Aaah...now I understand! Sorry for being a little slow, and thanks for explaining in more detail. I've uploaded a fix to CVS: str(msg) is only called if msg is not either a string or a Unicode object. With the fix, the following script: #--------------------------- import logging class X: def __init__(self, name): self.name = name def __str__(self): return self.name def main(): obj = X(u'\u00f6') logging.error(u'%s' % obj) logging.error(u'%s', obj) logging.error(u'%s-\u00f6', obj) if __name__ == "__main__": main() #--------------------------- Now gives the following output on my system (default encoding is 'ascii'): ERROR:root:Â ERROR:root:Â ERROR:root:Â-Â

History
Date	User	Action	Args
2022-04-11 14:56:13	admin	set	github: 42451
2005-10-05 18:11:46	tungwaiyip	create