Issue 1114093: unicode.decode

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/41513

classification

Title:	unicode.decode
Type:		Stage:
Components:	Unicode	Versions:	Python 2.4

process

Status:	closed	Resolution:	works for me
Dependencies:		Superseder:
Assigned To:	lemburg	Nosy List:	lemburg, manlioperillo
Priority:	normal	Keywords:

Created on 2005-02-01 16:13 by manlioperillo, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (2)
msg24126 - (view)	Author: Manlio Perillo (manlioperillo)	Date: 2005-02-01 16:13
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on win32 >>> print sys.getdefaultencoding() ascii Regards. The problem is this code: # -- coding: cp1252 -- >>> u'\xe0\xe8\xec\xf2\xf9'.decode('latin1') Traceback (most recent call last): File "<pyshell#15>", line 1, in ? u'\xe0\xe8\xec\xf2\xf9'.decode('latin1') UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128) I think this is a bug. Indeed this is the behaviour of str.encode: >>> '\xe0\xe8\xec\xf2\xf9'.encode('latin1') Traceback (most recent call last): File "<pyshell#12>", line 1, in ? '\xe0\xe8\xec\xf2\xf9'.encode('latin1') UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128) But this makes no sense for Unicode strings! I think unicode.decode should be a no-op. Manlio Perillo
msg24127 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2005-02-01 17:23
Logged In: YES user_id=38388 What the .encode() and .decode() methods do depends on the codec being used. In your example, the Latin-1 codec is used which is a codec that encodes from Unicode to 8-bit character strings and decodes the other way around. As a result the Unicode string in your first example is first converted to an 8-bit string using the default encoding (which is ASCII) and this fails. Same in the second case: Python tries to convert the 8-bit string to Unicode but this fails since the string contains non-ASCII characters. If you switch the types of the strings in both examples, you'll have no problem at all.

History
Date	User	Action	Args
2022-04-11 14:56:09	admin	set	github: 41513
2005-02-01 16:13:18	manlioperillo	create