This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: chr(128) in u'only ascii' -> TypeError with misleading msg
Type: Stage:
Components: None Versions:
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: effbot, georg.brandl, jackdied, laukpe, r.david.murray
Priority: normal Keywords:

Created on 2007-08-12 22:54 by laukpe, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (7)
msg32630 - (view) Author: Pekka Laukkanen (laukpe) * Date: 2007-08-12 22:54
A test using in format "chr(x) in <string>" raises a TypeError if "x" is in range 128-255 (i.e. non-ascii) and string is unicode. This happens even if the unicode string contains only ascii data as the example below demonstrates.

Python 2.5.1 (r251:54863, May  2 2007, 16:56:35) 
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> chr(127) in 'hello'
False
>>> chr(128) in 'hello'
False
>>> chr(127) in u'hi'
False
>>> chr(128) in u'hi'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'in <string>' requires string as left operand

This can cause pretty nasty and hard-to-debug bugs in code using "in <string>" format if e.g. user provided data is converted to unicode internally. Most other string operations work nicely between normal and unicode strings and I'd say simply returning False in this situation would be ok too. Issuing a warning similarly as below might be a good idea also.  

>>> chr(128) == u''
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

Finally, the error message is somewhat misleading since the left operand is definitely a string.

>>> type(chr(128))
<type 'str'>

A real life example of code where this problem exist is telnetlib. I'll submit a separate bug about it as that problem can obviously be fixed in the library itself.
msg32631 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2007-08-21 08:48
"Most other string operations work nicely between normal and unicode strings"

Nope.  You *always* get errors if you mix Unicode with NON-ASCII data (unless you've messed up the system's default encoding, which is a bad thing to do if you care about portability).  Some examples:

>>> chr(128) + u"foo"
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
>>> u"foo".find(chr(128))
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)

etc.  If there's a bug here, it's that you get a TypeError instead of a ValueError subclass.
msg32632 - (view) Author: Pekka Laukkanen (laukpe) * Date: 2007-08-21 14:03
Fredrik, you are obviously correct that most operations between normal and unicode strings don't work if the normal string contains non-ascii data.

I still do think that a UnicodeWarning like you get from "chr(128) == u'foo'" would be nicer than an exception and prevent problems like the one in telnetlib [1]. If an exception is raised I don't care too much about its type but a better message would make debugging possible problems easier.

[1] https://sourceforge.net/tracker/index.php?func=detail&aid=1772794&group_id=5470&atid=105470
msg85040 - (view) Author: Jack Diederich (jackdied) * (Python committer) Date: 2009-04-01 16:22
assigning all open telnetlib items to myself
msg119568 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-10-25 17:56
I don't think we'll do anything about this message in 2.x -- in 3.x you get a clear TypeError anyway if you mix str and bytes.
msg119570 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-25 19:16
I'm not sure that I'd consider:

>>> 'abc' in b'abcde'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Type str doesn't support the buffer API

a clear error message :)  It certainly isn't as bad as the 2.x message, though.
msg119581 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-10-25 21:45
Ah. I tried the other combination :)
History
Date User Action Args
2022-04-11 14:56:25adminsetgithub: 45303
2010-10-25 21:45:23georg.brandlsetmessages: + msg119581
2010-10-25 19:16:40r.david.murraysetnosy: + r.david.murray
messages: + msg119570
2010-10-25 17:56:04georg.brandlsetstatus: open -> closed

nosy: + georg.brandl
messages: + msg119568

resolution: out of date
2009-04-01 16:22:08jackdiedsetnosy: + jackdied
messages: + msg85040
2007-08-12 22:54:08laukpecreate