Issue 969415: CJK codecs list incomplete

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/40364

classification

Title:	CJK codecs list incomplete
Type:		Stage:
Components:	Documentation	Versions:	Python 2.4

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	hyeshik.chang, lemburg, loewis, mike_j_brown, rhettinger
Priority:	normal	Keywords:

Created on 2004-06-09 06:54 by mike_j_brown, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (10)
msg21081 - (view)	Author: Mike Brown (mike_j_brown)	Date: 2004-06-09 06:54
http://www.python.org/dev/doc/devel/whatsnew/node7. html states that various CJK encodings have been added, but the list given there does not match the list on http://www.python.org/dev/doc/devel/lib/node128.html. In particular, missing from the latter list are all of the aliases with hyphens: shift-jis, shift-jisx0213, euc-jp, euc-jisx0213, iso-2022- jp, iso-2022-jp-1, iso-2022-jp-2, iso-2022-jp-3, iso- 2022-jp-ext, euc-kr, iso-2022-kr Since I successfully ran codecs.lookup() tests on a few of the hyphenated aliases, I assume that the omission of the hyphenated versions in the docs is merely an oversight.
msg21082 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-06-09 07:01
Logged In: YES user_id=55188 All hyphens are translated as underscores in encoding lookups. So we may not need to provide encoding list with hyphens additionally.
msg21083 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-06-09 07:10
Logged In: YES user_id=55188 Reopened to consider the consistence with non-cjk codecs. All the non-cjk codecs are written with hyphen even if their realname is with underscore. (eg. iso8859-1 and iso8859_1.py) Will changing cjk codecs's codec/alias names to use not underscores but hyphens make docs more friendly?
msg21084 - (view)	Author: Mike Brown (mike_j_brown)	Date: 2004-06-09 08:25
Logged In: YES user_id=371366 I see no reason to omit any aliases that are recognized, especially when the aliases in question are, more often than not, the IANA's preferred MIME name as shown at http://www.iana.org/assignments/character-sets. I was looking in the docs to see if Python 2.4 was going to support 'euc-jp', and was dismayed to see 'euc_jp' and variants but no 'euc-jp'. I had to obtain and install 2.4a0 to test to find out that it was just a documentation problem. Please consider listing all realnames and aliases.
msg21085 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-06-12 07:05
Logged In: YES user_id=80475 Mark, would you pronounce on this one.
msg21086 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2004-06-12 11:54
Logged In: YES user_id=21627 It is just not feasible to list all recognized aliases. For example, for ISO-8859-1, there are trivial 31 aliases, including Iso_8859-1 and iSO-8859_1. For shift_jisx0213, there are 1023 trivial aliases. The aliases column in the documentation should only list non-trivial aliases, and for these, it should list a form that people are most likely to encounter. So if "s-jis" would be more common than "s_jis", this is what should be listed. If s-JIS is even more common, this should be listed. The top of the page should say that case in encoding names does not matter, and that _ and - can be freely substituted.
msg21087 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2004-06-12 11:59
Logged In: YES user_id=21627 Actually, the top of the page does already say Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases.
msg21088 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-06-12 12:04
Logged In: YES user_id=38388 I think that it might be a good idea to document of how the standard search function of the encodings package work at the top of that page, namely to normalize encoding names before doing the lookup: """ Normalization works as follows: all non-alphanumeric characters except the dot used for Python package names are collapsed and replaced with a single underscore, e.g. ' -;#' becomes '_'. Leading and trailing underscores are removed. Note that encoding names should be ASCII only; if they do use non-ASCII characters, these must be Latin-1 compatible. """ The table should then only list normalized encoding names (which I think is already the case).
msg21089 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2004-06-14 20:28
Logged In: YES user_id=21627 Assigning to somebody else without asking for permission is impolite, IMO; unassigning the report from anybody.
msg21090 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-07-17 14:49
Logged In: YES user_id=55188 I changed aliases with _ which are popular as with hyphens than underscores in consistency of iso-8859 aliases. Doc/lib/libcodecs.tex 1.31

History
Date	User	Action	Args
2022-04-11 14:56:04	admin	set	github: 40364
2004-06-09 06:54:21	mike_j_brown	create