This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Dubious use of Unicode literals in tutorial
Type: Stage:
Components: Documentation Versions: Python 2.3
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: ghaering Nosy List: ghaering, loewis
Priority: normal Keywords:

Created on 2003-07-18 00:10 by ghaering, last changed 2022-04-10 16:10 by admin. This issue is now closed.

Messages (5)
msg17102 - (view) Author: Gerhard Häring (ghaering) * (Python committer) Date: 2003-07-18 00:10
"3.1.3 Unicode Strings" contains:

>>> u"äöü"
u'\xe4\xf6\xfc'

It looks like latin1 is used as encoding for this
Unicode literal. This is, however, neither justified by
the Python language specification nor by common sense ;-)

I'd suggest that this be changed in the tutorial.

Furthermore I'd suggest that such use of Unicode
literals throw errors instead.

I'm assigning it to Martin, because he's one of the
Unicode gurus :)
msg17103 - (view) Author: Gerhard Häring (ghaering) * (Python committer) Date: 2003-07-18 09:54
Logged In: YES 
user_id=163326

I *do* get the warnings when executing scripts, but I do
*not* get the warnings at the interactive prompt. Lowering
priority accordingly. Feel free to close this if this is
indeed the intended behaviour.
msg17104 - (view) Author: Gerhard Häring (ghaering) * (Python committer) Date: 2003-07-18 09:55
Logged In: YES 
user_id=163326

Oh, I almost forgot about my original problem. Such
behaviour shouldn't be encouraged in the tutorial any longer
of course. Maybe I can find a better wording and submit a
patch soonish. Now that I understand this better, assignnig
this to myself :)
msg17105 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-07-18 15:04
Logged In: YES 
user_id=21627

The intended behaviour is that Unicode in the interactive
mode "works", and assumes that the actual input is encoded
according to the locale's encoding (i.e.
sys.stdin.encoding). That isn't implemented, yet (and
perhaps not even specified). However, it is most likely the
case that this is a doc bug in ref manual, not in the tutorial.

For this to work, the interactive mode needs to be able to
pass the encoding to the parser, perhaps giving a Unicode
object as the argument instead of a byte string (atleast
IDLE might want to do that). Passing Unicode objects to
eval/exec is for further study, though.
msg17106 - (view) Author: Gerhard Häring (ghaering) * (Python committer) Date: 2004-08-09 20:19
Logged In: YES 
user_id=163326

I just read PEP 0263, which says:

"""
In Python 2.1, Unicode literals can only be written using the
Latin-1 based encoding "unicode-escape".
"""

So the tutorial is correct and the things Martin suggested
are only possible improvements to Python, there's no actual
bug here.

So I'll close this one.
History
Date User Action Args
2022-04-10 16:10:01adminsetgithub: 38871
2003-07-18 00:10:36ghaeringcreate