This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: unicode.decode
Type: Stage:
Components: Unicode Versions: Python 2.4
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: lemburg Nosy List: lemburg, manlioperillo
Priority: normal Keywords:

Created on 2005-02-01 16:13 by manlioperillo, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (2)
msg24126 - (view) Author: Manlio Perillo (manlioperillo) Date: 2005-02-01 16:13
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32
bit (Intel)] on win32

>>> print sys.getdefaultencoding()
ascii


Regards.

The problem is this code:

# -*- coding: cp1252 -*-

>>> u'\xe0\xe8\xec\xf2\xf9'.decode('latin1')
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in ?
    u'\xe0\xe8\xec\xf2\xf9'.decode('latin1')
UnicodeEncodeError: 'ascii' codec can't encode
characters in position 0-4: ordinal not in range(128)


I think this is a bug.
Indeed this is the behaviour of str.encode:

>>> '\xe0\xe8\xec\xf2\xf9'.encode('latin1')
Traceback (most recent call last):
  File "<pyshell#12>", line 1, in ?
    '\xe0\xe8\xec\xf2\xf9'.encode('latin1')
UnicodeDecodeError: 'ascii' codec can't decode byte
0xe0 in position 0: ordinal not in range(128)

But this makes no sense for Unicode strings!
I think unicode.decode should be a no-op.



Manlio Perillo
msg24127 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2005-02-01 17:23
Logged In: YES 
user_id=38388

What the .encode() and .decode() methods do depends on the
codec being used. 

In your example, the Latin-1 codec is used which is a codec
that encodes from Unicode to 8-bit character strings and
decodes the other way around. As a result the Unicode string
in your first example is first converted to an 8-bit string
using the default encoding (which is ASCII) and this fails.
Same in the second case: Python tries to convert the 8-bit
string to Unicode but this fails since the string contains
non-ASCII characters.

If you switch the types of the strings in both examples,
you'll have no problem at all.
History
Date User Action Args
2022-04-11 14:56:09adminsetgithub: 41513
2005-02-01 16:13:18manlioperillocreate