This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Issue in unicode args in logging
Type: Stage:
Components: Unicode Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: vinay.sajip Nosy List: lemburg, nnorwitz, tungwaiyip, vinay.sajip
Priority: normal Keywords:

Created on 2005-10-05 18:11 by tungwaiyip, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)
msg26512 - (view) Author: Wai Yip Tung (tungwaiyip) Date: 2005-10-05 18:11
logging has an issue in handling unicode object 
arguments.

>>> import logging
>>>
>>> class Obj:
...     def __init__(self,name):
...         self.name = name
...     def __str__(self):
...         return self.name
...
>>> # a non-ascii string
...
>>> obj = Obj(u'\u00f6')
>>>
>>> # this will cause error
...
>>> print '%s' % obj
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode 
character u'\xf6' in position 0: ordinal not in range(128)
>>>
>>> # this will promote to unicode (and the console also 
happen to be able to display it)
...
>>> print u'%s' % obj
ö
>>>
>>> # this works fine
... # (other than logging makes its own decision to 
encode in utf8)
...
>>> logging.error(u'%s' % obj)
ERROR:root:b
>>>
>>> # THIS IS AN UNEXPECTED PROBLEM!!!
...
>>> logging.error(u'%s', obj)
Traceback (most recent call last):
  File "C:\Python24\lib\logging\__init__.py", line 706, in 
emit
    msg = self.format(record)
  File "C:\Python24\lib\logging\__init__.py", line 592, in 
format
    return fmt.format(record)
  File "C:\Python24\lib\logging\__init__.py", line 382, in 
format
    record.message = record.getMessage()
  File "C:\Python24\lib\logging\__init__.py", line 253, in 
getMessage
    msg = msg % self.args
UnicodeEncodeError: 'ascii' codec can't encode 
character u'\xf6' in position 0: ordinal not in range(128)
>>>
>>> # workaround the str() conversion in getMessage()
...
>>> logging.error(u'%s-\u00f6', obj)
ERROR:root:b-b


The issue seems to be in LogRecord.getMessage(). It 
attempts to convert msg to byte string:

   msg = str(self.msg)

I am not sure why ti want to do the conversion. The last 
example workaround this by making sure msg is not 
convertible to byte string.
msg26513 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2005-10-05 20:47
Logged In: YES 
user_id=38388

Unassinging the bug. I don't know anything about the logging
module.

Hint: perhaps the logging module should grow an .encoding
attribute which then allows converting Unicode to some
encoding used in the log file ?!
msg26514 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2005-10-06 04:00
Logged In: YES 
user_id=33168

Vinay, any suggestions?
msg26515 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2005-10-06 08:44
Logged In: YES 
user_id=308438

Misc. changes were backported into Python 2.4.2, please
check that you have this version.

The problem is not with

msg = str(self.msg)

but rather with

msg = msg % args

To ensure good Unicode support, ensure your messages are
either Unicode strings or objects whose __str__() method
returns a Unicode string. Then, 

msg = msg % args

should result in a Unicode object. You can pass this to a
FileHandler opened with an encoding argument, or a
StreamHandler whose stream has been opened using
codecs.open(). Ensure your default encoding is set correctly
using sitecustomize.py.

The encoding additions were made in Revision 1.26 of
logging/__init__.py, dated 13/03/2005.

Marking as closed.
msg26516 - (view) Author: Wai Yip Tung (tungwaiyip) Date: 2005-10-06 23:16
Logged In: YES 
user_id=561546

>>To ensure good Unicode support, ensure your messages 
are either Unicode strings or objects whose __str__() method
returns a Unicode string. Then, 

>>msg = msg % args

That's what I am doing already. 

Let me explain the subtle problem again.

1. print '%s' % obj - error
2. logging.error(u'%s' % obj) - ok
3. logging.error(u'%s', obj) - error
4. logging.error(u'%s-\u00f6', obj) -ok

I can understand how 1 fails. But I expect 2,3 and 4 to work 
similarly. Especially contrast 3 with 4. 4 work when 3 doesn't 
because when str() is applied to u'%s-\u00f6' it fails and it 
fallbacks to the original unicode string, which is the correct 
way in my opinion. Whereas in 3, the u'%s' get demoted to 
byte string '%s' so it fails like 1.
msg26517 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2005-10-07 08:43
Logged In: YES 
user_id=308438

Aaah...now I understand! Sorry for being a little slow,  and
thanks for explaining in more detail.

I've uploaded a fix to CVS: str(msg) is only called if msg
is not either a string or a Unicode object. With the fix,
the following script:
#---------------------------
import logging

class X:
    def __init__(self, name):
        self.name = name

    def __str__(self):
        return self.name

def main():
    obj = X(u'\u00f6')
    logging.error(u'%s' % obj)
    logging.error(u'%s', obj)
    logging.error(u'%s-\u00f6', obj)

if __name__ == "__main__":
    main()
#---------------------------

Now gives the following output on my system (default
encoding is 'ascii'):

ERROR:root:Â
ERROR:root:Â
ERROR:root:Â-Â
History
Date User Action Args
2022-04-11 14:56:13adminsetgithub: 42451
2005-10-05 18:11:46tungwaiyipcreate