Issue969415
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2004-06-09 06:54 by mike_j_brown, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (10) | |||
---|---|---|---|
msg21081 - (view) | Author: Mike Brown (mike_j_brown) | Date: 2004-06-09 06:54 | |
http://www.python.org/dev/doc/devel/whatsnew/node7. html states that various CJK encodings have been added, but the list given there does not match the list on http://www.python.org/dev/doc/devel/lib/node128.html. In particular, missing from the latter list are all of the aliases with hyphens: shift-jis, shift-jisx0213, euc-jp, euc-jisx0213, iso-2022- jp, iso-2022-jp-1, iso-2022-jp-2, iso-2022-jp-3, iso- 2022-jp-ext, euc-kr, iso-2022-kr Since I successfully ran codecs.lookup() tests on a few of the hyphenated aliases, I assume that the omission of the hyphenated versions in the docs is merely an oversight. |
|||
msg21082 - (view) | Author: Hyeshik Chang (hyeshik.chang) * | Date: 2004-06-09 07:01 | |
Logged In: YES user_id=55188 All hyphens are translated as underscores in encoding lookups. So we may not need to provide encoding list with hyphens additionally. |
|||
msg21083 - (view) | Author: Hyeshik Chang (hyeshik.chang) * | Date: 2004-06-09 07:10 | |
Logged In: YES user_id=55188 Reopened to consider the consistence with non-cjk codecs. All the non-cjk codecs are written with hyphen even if their realname is with underscore. (eg. iso8859-1 and iso8859_1.py) Will changing cjk codecs's codec/alias names to use not underscores but hyphens make docs more friendly? |
|||
msg21084 - (view) | Author: Mike Brown (mike_j_brown) | Date: 2004-06-09 08:25 | |
Logged In: YES user_id=371366 I see no reason to omit any aliases that are recognized, especially when the aliases in question are, more often than not, the IANA's preferred MIME name as shown at http://www.iana.org/assignments/character-sets. I was looking in the docs to see if Python 2.4 was going to support 'euc-jp', and was dismayed to see 'euc_jp' and variants but no 'euc-jp'. I had to obtain and install 2.4a0 to test to find out that it was just a documentation problem. Please consider listing all realnames and aliases. |
|||
msg21085 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2004-06-12 07:05 | |
Logged In: YES user_id=80475 Mark, would you pronounce on this one. |
|||
msg21086 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2004-06-12 11:54 | |
Logged In: YES user_id=21627 It is just not feasible to list all recognized aliases. For example, for ISO-8859-1, there are trivial 31 aliases, including Iso_8859-1 and iSO-8859_1. For shift_jisx0213, there are 1023 trivial aliases. The aliases column in the documentation should only list non-trivial aliases, and for these, it should list a form that people are most likely to encounter. So if "s-jis" would be more common than "s_jis", this is what should be listed. If s-JIS is even more common, this should be listed. The top of the page should say that case in encoding names does not matter, and that _ and - can be freely substituted. |
|||
msg21087 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2004-06-12 11:59 | |
Logged In: YES user_id=21627 Actually, the top of the page does already say Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases. |
|||
msg21088 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2004-06-12 12:04 | |
Logged In: YES user_id=38388 I think that it might be a good idea to document of how the standard search function of the encodings package work at the top of that page, namely to normalize encoding names before doing the lookup: """ Normalization works as follows: all non-alphanumeric characters except the dot used for Python package names are collapsed and replaced with a single underscore, e.g. ' -;#' becomes '_'. Leading and trailing underscores are removed. Note that encoding names should be ASCII only; if they do use non-ASCII characters, these must be Latin-1 compatible. """ The table should then only list normalized encoding names (which I think is already the case). |
|||
msg21089 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2004-06-14 20:28 | |
Logged In: YES user_id=21627 Assigning to somebody else without asking for permission is impolite, IMO; unassigning the report from anybody. |
|||
msg21090 - (view) | Author: Hyeshik Chang (hyeshik.chang) * | Date: 2004-07-17 14:49 | |
Logged In: YES user_id=55188 I changed aliases with _ which are popular as with hyphens than underscores in consistency of iso-8859 aliases. Doc/lib/libcodecs.tex 1.31 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:04 | admin | set | github: 40364 |
2004-06-09 06:54:21 | mike_j_brown | create |