This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Encodings iso8859_1 and latin_1 are redundant
Type: Stage:
Components: Unicode Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: lemburg Nosy List: exa, lemburg, liturgist
Priority: normal Keywords:

Created on 2005-08-12 12:22 by liturgist, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (7)
msg26033 - (view) Author: liturgist (liturgist) Date: 2005-08-12 12:22
./lib/encodings contains both:

    iso8859_1.py
    latin_1.py

Only one should be present.  Martin says that latin_1
is faster.  Using the 'iso' name would correlate better
with the other ISO encodings provided.

If the latin_1 code is faster, then it should be in the
iso8859_1.py file.  If an automated process produces
the 'iso*' encodings, then it should either produce the
faster code or stop producing iso8859_1.

Regardless, one of the files should be removed.
msg26034 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2005-08-12 12:49
Logged In: YES 
user_id=38388

Good point.

The iso8859_1.py codec should be removed and added as alias
to latin-1.

Martin is right: the latin-1 codec is not only faster, but
the Unicode integration code also has a lot of short-cuts
for the "latin-1" encoding, so overall performance is better
if you use that name for the encoding.
msg26035 - (view) Author: liturgist (liturgist) Date: 2005-08-12 13:12
Logged In: YES 
user_id=197677

Ok.  How about if we specify iso8859_1 as "(see latin_1)" in
the documentation?

The code will work the same regardless of which encoding
name the developer uses.  Right?
msg26036 - (view) Author: liturgist (liturgist) Date: 2005-08-12 14:01
Logged In: YES 
user_id=197677

Where could one see some of the "shortcuts" in the Unicode
integration code that make using "latin_1" faster in the
runtime?  I greped *.py and *.c, but could not readily
identify any candidates.
msg26037 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2005-08-12 14:30
Logged In: YES 
user_id=38388

To answer your questions:

Yes, the encoding is the same for both latin-1 and iso8859-1.

Specifying latin-1 instead of iso8859-1 will allow the code
to use short-cuts.

You have to grep for 'latin-1'.
msg26038 - (view) Author: Eray Ozkural (exa) Date: 2005-10-11 21:22
Logged In: YES 
user_id=1454

i understand that there ought to be one fast implementation, but i 
suppose both names should be available. 
 
msg26039 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2005-10-21 14:03
Logged In: YES 
user_id=38388

I've added an alias entry from iso8859_1 to latin_1.
History
Date User Action Args
2022-04-11 14:56:12adminsetgithub: 42270
2005-08-12 12:22:28liturgistcreate