This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: CJK codecs list incomplete
Type: Stage:
Components: Documentation Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: hyeshik.chang, lemburg, loewis, mike_j_brown, rhettinger
Priority: normal Keywords:

Created on 2004-06-09 06:54 by mike_j_brown, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (10)
msg21081 - (view) Author: Mike Brown (mike_j_brown) Date: 2004-06-09 06:54
http://www.python.org/dev/doc/devel/whatsnew/node7.
html states that various CJK encodings have been 
added, but the list given there does not match the list 
on 
http://www.python.org/dev/doc/devel/lib/node128.html.

In particular, missing from the latter list are all of the 
aliases with hyphens:

shift-jis, shift-jisx0213, euc-jp, euc-jisx0213, iso-2022-
jp, iso-2022-jp-1, iso-2022-jp-2, iso-2022-jp-3, iso-
2022-jp-ext, euc-kr, iso-2022-kr

Since I successfully ran codecs.lookup() tests on a few 
of the hyphenated aliases, I assume that the omission 
of the hyphenated versions in the docs is merely an 
oversight.
msg21082 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-06-09 07:01
Logged In: YES 
user_id=55188

All hyphens are translated as underscores in encoding lookups.
So we may not need to provide encoding list with hyphens
additionally.
msg21083 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-06-09 07:10
Logged In: YES 
user_id=55188

Reopened to consider the consistence with non-cjk codecs.
All the non-cjk codecs are written with hyphen even if their
realname is with underscore. (eg. iso8859-1 and iso8859_1.py)
Will changing cjk codecs's codec/alias names to use not
underscores but hyphens make docs more friendly?
msg21084 - (view) Author: Mike Brown (mike_j_brown) Date: 2004-06-09 08:25
Logged In: YES 
user_id=371366

I see no reason to omit any aliases that are recognized, 
especially when the aliases in question are, more often than 
not, the IANA's preferred MIME name as shown at 
http://www.iana.org/assignments/character-sets.

I was looking in the docs to see if Python 2.4 was going to 
support 'euc-jp', and was dismayed to see 'euc_jp' and 
variants but no 'euc-jp'. I had to obtain and install 2.4a0 to 
test to find out that it was just a documentation problem.

Please consider listing all realnames and aliases.
msg21085 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-06-12 07:05
Logged In: YES 
user_id=80475

Mark, would you pronounce on this one.
msg21086 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-06-12 11:54
Logged In: YES 
user_id=21627

It is just not feasible to list all recognized aliases. For
example, for
ISO-8859-1, there are trivial 31 aliases, including
Iso_8859-1 and iSO-8859_1. For shift_jisx0213, there are
1023 trivial aliases.

The aliases column in the documentation should only list
non-trivial aliases, and for these, it should list a form
that people are most likely to encounter. So if "s-jis"
would be more common than "s_jis", this is what should be
listed. If s-JIS is even more common, this should be listed.

The top of the page should say that case in encoding names
does not matter, and that _ and - can be freely substituted.
msg21087 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-06-12 11:59
Logged In: YES 
user_id=21627

Actually, the top of the page does already say

Notice that spelling alternatives that only differ in case
or use a hyphen instead of an underscore are also valid aliases.
msg21088 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-06-12 12:04
Logged In: YES 
user_id=38388

I think that it might be a good idea to document of how the
standard search
function of the encodings package work at the top of that
page, namely
to normalize encoding names before doing the lookup:

"""
        Normalization works as follows: all non-alphanumeric
        characters except the dot used for Python package
names are
        collapsed and replaced with a single underscore,
e.g. '  -;#'
        becomes '_'. Leading and trailing underscores are
removed.

        Note that encoding names should be ASCII only; if
they do use
        non-ASCII characters, these must be Latin-1 compatible.
"""

The table should then only list normalized encoding names (which
I think is already the case).
msg21089 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-06-14 20:28
Logged In: YES 
user_id=21627

Assigning to somebody else without asking for permission is
impolite, IMO; unassigning the report from anybody.
msg21090 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-07-17 14:49
Logged In: YES 
user_id=55188

I changed aliases with _ which are popular as with hyphens than 
underscores in consistency of iso-8859 aliases.

Doc/lib/libcodecs.tex 1.31
History
Date User Action Args
2022-04-11 14:56:04adminsetgithub: 40364
2004-06-09 06:54:21mike_j_browncreate