Issue 971106: Comparisons of unicode and strings raises UnicodeErrors

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/40383

classification

Title:	Comparisons of unicode and strings raises UnicodeErrors
Type:		Stage:
Components:	Unicode	Versions:	Python 2.3

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:	lemburg	Nosy List:	ctheune, lemburg, tim.peters
Priority:	normal	Keywords:

Created on 2004-06-11 13:07 by ctheune, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (10)
msg21132 - (view)	Author: Christian Theune (ctheune) *	Date: 2004-06-11 13:07
When comparing unicode and strings the implicit conversion will raise an exception instead of returning false. See the example later on. We (Christian Theune and Jim Fulton) suggest that if the ordinary string can't be decoded, that False should be returned. This seems to be the only sane approach given's Python's policy of implicitly converting strings to unicode using a default encoding. Python 2.3.4 (#1, Jun 10 2004, 11:08:42) [GCC 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> class Elephant: ... pass ... >>> e1 = Elephant() >>> e1 == 5 False >>> u"asdf" == "asdfö" Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 4: ordinal not in range(128) >>> e1 == "asdfö" False >>> e1 == u"asdfö" False >>>
msg21133 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-06-14 16:06
Logged In: YES user_id=38388 I'm not sure what you are suggesting here... do you want u"asdf" == "asdfö" to return False ?
msg21134 - (view)	Author: Christian Theune (ctheune) *	Date: 2004-06-14 16:16
Logged In: YES user_id=100622 That's what Jim and I found to be the best solution (for now)
msg21135 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-06-14 16:21
Logged In: YES user_id=38388 In that case I'll have to close the request as "Won't fix": comparisons can raise exceptions and if you're comparing apples and eggs you should get an exception instead of a misleading result which only hides programming errors. Sorry.
msg21136 - (view)	Author: Tim Peters (tim.peters) *	Date: 2004-06-14 16:48
Logged In: YES user_id=31435 Christian, you and/or Jim would need to make the case on Python-Dev to change this behavior, and get Guido to agree. The current behavior isn't an accident -- it's functioning as documented and designed. Changing existing behavior isn't easy (and shouldn't be easy ...).
msg21137 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-06-14 17:05
Logged In: YES user_id=38388 You won't get my approval on this one, that's for sure... then again I'm not sure how much say I have on these things on python-dev anymore four years after having led it's design. As data point, there have been discussions about raising exceptions in cases where types don't match and there is no support for the given type combination like in your e1 == 5 example, so maybe stirring up noise isn't the best strategy in this case :-)
msg21138 - (view)	Author: Tim Peters (tim.peters) *	Date: 2004-06-14 17:17
Logged In: YES user_id=31435 I think some noise would be a good thing <wink>. Really! Comparisons in Python are a mess now, and it would be good to hammer out a (more) consistent story. Note that the new-in-2.3 Python sets.* and datetime.* types do some of each for comparisons against incompatible types: for <=, <, >, and >=, they raise an exception. For == they return True, and for != they return False. Guido gave a lot of thought to those at the time, and it's no coincidence that these new types act the same way. But older types don't. Maybe the new idea was a mistake. Maybe the older types should change. Maybe there are good reasons to keep them acting differently. Some noise about all this is really needed before Python 3.
msg21139 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-06-14 17:32
Logged In: YES user_id=38388 Here's an example to make you think this over: >>> u"äöü" == "äöü" False Try to explain that to a novice :-)
msg21140 - (view)	Author: Tim Peters (tim.peters) *	Date: 2004-06-14 18:16
Logged In: YES user_id=31435 In current Python, both string literals generate a DeprecationWarning: Non-ASCII character ... msg, with a pointer to the docs (your PEP 263). When the implementation of that PEP is complete, the example line won't be legal Python without an explicit encoding declaration (in which case it will presumably return True). So I don't know that the example matters going forward; 2.3 isn't going to change.
msg21141 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-06-14 18:27
Logged In: YES user_id=38388 Type it into an interactive session: >>> u"äöü" == "äöü" Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128) This would return False if the change were to happen. In source code the above is not valid without encoding declaration (which is good). But even with encoding declaration, the string literal will still be interpreted using the default encoding (the encoding only applies to the Unicode literal), so the result is the same: unicompare.py: # -- coding:latin-1 - print u"äöü".encode('latin-1') print "äöü" print u"äöü" == "äöü" $ python2.3 unicompare.py äöü äöü Traceback (most recent call last): File "unicompare.py", line 5, in ? print u"äöü" == "äöü" UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)

History
Date	User	Action	Args
2022-04-11 14:56:04	admin	set	github: 40383
2004-06-11 13:07:36	ctheune	create