This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python 2.3 encoding parsing bug
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: edream, lemburg, loewis
Priority: normal Keywords:

Created on 2004-02-17 14:36 by edream, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (4)
msg20020 - (view) Author: Edward K. Ream (edream) Date: 2004-02-17 14:36
The documentation for encoding lines at

C:\Python23\Doc\Python-Docs-2.3.1\whatsnew\section-
encodings.html

states:

"Encodings are declared by including a specially 
formatted comment in the first or second line of the 
source file."

In fact, contrary to the implication, the Python 2.3 
parser does not look for lines of the form:

# -*- coding: <encoding> -*-

For example, Python improperly scans the following line 
for an encoding

#@+leo-ver=4-encoding=iso-8859-1.

and reports that iso-8859-1. (note trailing dot) is an 
invalid encoding!

The workaround for my app is to precede this line with 
the following line:

# -*- coding: iso-8859-1 -*-

This makes Python 2.3 happy.

To make myself perfectly clear: Python has absolutely 
no right to complain about comment lines that do not 
have the form:

# -*- coding: <encoding> -*-

Python 2.3.1
Windows XP

Edward K. Ream
edreamleo@charter.net
msg20021 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-02-17 21:14
Logged In: YES 
user_id=38388

Python is behaving correctly and according to the PEP.

The encoding declaration parser will look for "coding[:=][
\t]*<encoding>"
to make it play nice with various different editor encoding
comments
in use today. The format you are quoting is Emacs-style, but
there are also vi-style and various other formats. Most of them
use the "coding[:=]" declaration which is why this parsing
method
was chosen.

Does leo need the trailing dot in the comment ?
msg20022 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-02-17 21:47
Logged In: YES 
user_id=21627

Actually, what Python should (and does) really do is to
follow the language specification (the PEP becomes
irrelevant once implemented):

http://www.python.org/doc/current/ref/encodings.html

This gives the precise regexp that is used.

Differences between the language spec and the implementation
would be considered as a bug. Closing this report as not-a-bug.
msg20023 - (view) Author: Edward K. Ream (edream) Date: 2004-02-17 22:59
Logged In: YES 
user_id=14056

> Does leo need the trailing dot in the comment?

In general, Leo needs to know where the encoding 
specification ends and a possible end-block-comment delim 
begin.  In specific languages, and in particular Python, Leo 
would not have needed the trailing dot.  Alas, this is a moot 
point.  The only options available to Leo now are:

1. Have the user insert encoding comments by hand or
2. Change the format of files created by Leo.

In other words, no previous 4.x version of Leo (including 4.1 
final, due tomorrow) can ever work with Python 2.3 without 
the user inserting a workaround.

I am most upset that the Pep said one thing in English and 
something almost completely different in the re.  Furthermore, 
what the re implies is a very bad idea: having a _restricted_ 
kind of special-purpose comment is one thing:  having a way-
too-general kind of special-purpose comment is wrong, wrong, 
wrong.  It needlessly invalidates comments that _should_ 
have been none of Python's business.  Yes, I know there was 
a reason for this bad idea; there always is.

Edward
History
Date User Action Args
2022-04-11 14:56:02adminsetgithub: 39944
2004-02-17 14:36:28edreamcreate