This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Comparisons of unicode and strings raises UnicodeErrors
Type: Stage:
Components: Unicode Versions: Python 2.3
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: lemburg Nosy List: ctheune, lemburg, tim.peters
Priority: normal Keywords:

Created on 2004-06-11 13:07 by ctheune, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (10)
msg21132 - (view) Author: Christian Theune (ctheune) * Date: 2004-06-11 13:07
When comparing unicode and strings the implicit conversion
will raise an exception instead of returning false. See the
example later on.

We (Christian Theune and Jim Fulton) suggest that
if the ordinary string can't be decoded,  that False should
be returned.  This seems to be the only sane approach 
given's Python's policy of implicitly converting strings
to unicode using a default encoding.


Python 2.3.4 (#1, Jun 10 2004, 11:08:42) 
[GCC 3.3.3 20040412 (Gentoo Linux 3.3.3-r6,
ssp-3.3.2-2, pie-8.7.6)] on linux2
Type "help", "copyright", "credits" or "license" for
more information.
>>> class Elephant:
...     pass
... 
>>> e1 = Elephant()
>>> e1 == 5
False
>>> u"asdf" == "asdfö"
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte
0xf6 in position 4: ordinal not in range(128)
>>> e1 == "asdfö"
False
>>> e1 == u"asdfö"
False
>>> 
msg21133 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-06-14 16:06
Logged In: YES 
user_id=38388

I'm not sure what you are suggesting here... do you want
u"asdf" == "asdfö" to return False ?
msg21134 - (view) Author: Christian Theune (ctheune) * Date: 2004-06-14 16:16
Logged In: YES 
user_id=100622

That's what Jim and I found to be the best solution (for now) 
msg21135 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-06-14 16:21
Logged In: YES 
user_id=38388

In that case I'll have to close the request as "Won't fix":
comparisons can raise exceptions and if you're comparing
apples and eggs you should get an exception instead of
a misleading result which only hides programming errors.

Sorry.
msg21136 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2004-06-14 16:48
Logged In: YES 
user_id=31435

Christian, you and/or Jim would need to make the case on 
Python-Dev to change this behavior, and get Guido to agree.  
The current behavior isn't an accident -- it's functioning as 
documented and designed.  Changing existing behavior isn't 
easy (and shouldn't be easy ...).
msg21137 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-06-14 17:05
Logged In: YES 
user_id=38388

You won't get my approval on this one, that's for sure... then
again I'm not sure how much say I have on these things on
python-dev anymore four years after having led it's design.

As data point, there have been discussions about raising 
exceptions in cases where types don't match and there is no 
support for the given type combination like in your e1 == 5 
example, so maybe stirring up noise isn't the best strategy
in this case :-)
msg21138 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2004-06-14 17:17
Logged In: YES 
user_id=31435

I think some noise would be a good thing <wink>.  Really!  
Comparisons in Python are a mess now, and it would be good 
to hammer out a (more) consistent story.

Note that the new-in-2.3 Python sets.* and datetime.* 
types do some of each for comparisons against incompatible 
types:  for <=, <, >, and >=, they raise an exception.  For 
== they return True, and for != they return False.  Guido 
gave a lot of thought to those at the time, and it's no 
coincidence that these new types act the same way.

But older types don't.  Maybe the new idea was a mistake.  
Maybe the older types should change.  Maybe there are good 
reasons to keep them acting differently.  Some noise about 
all this is really needed before Python 3.
msg21139 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-06-14 17:32
Logged In: YES 
user_id=38388

Here's an example to make you think this over:

>>> u"äöü" == "äöü"
False

Try to explain that to a novice :-)
msg21140 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2004-06-14 18:16
Logged In: YES 
user_id=31435

In current Python, both string literals generate a

DeprecationWarning: Non-ASCII character ...

msg, with a pointer to the docs (your PEP 263).  When the 
implementation of that PEP is complete, the example line 
won't be legal Python without an explicit encoding declaration 
(in which case it will presumably return True).

So I don't know that the example matters going forward; 2.3 
isn't going to change.
msg21141 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-06-14 18:27
Logged In: YES 
user_id=38388

Type it into an interactive session:

>>> u"äöü" == "äöü"
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in
position 0: ordinal not in range(128)

This would return False if the change were to happen.

In source code the above is not valid without encoding
declaration (which is good). But even with encoding
declaration, the string literal will still be interpreted using
the default encoding (the encoding only applies to the
Unicode literal), so the result is the same:

unicompare.py:
# -*- coding:latin-1 -*
print u"äöü".encode('latin-1')
print "äöü"
print u"äöü" == "äöü"

$ python2.3 unicompare.py
äöü
äöü
Traceback (most recent call last):
  File "unicompare.py", line 5, in ?
    print u"äöü" == "äöü"
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in
position 0: ordinal not in range(128)

History
Date User Action Args
2022-04-11 14:56:04adminsetgithub: 40383
2004-06-11 13:07:36ctheunecreate