This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Japanese Unicode Codecs
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: hyeshik.chang, lemburg, suzuki_hisao
Priority: low Keywords: patch

Created on 2003-01-12 03:46 by suzuki_hisao, last changed 2022-04-10 16:06 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
ja-codecs-0.6.tar.bz2 suzuki_hisao, 2003-01-12 03:49
Messages (6)
msg42401 - (view) Author: SUZUKI Hisao (suzuki_hisao) Date: 2003-01-12 03:46
This is an implementation of a set of Japanese Unicode
codecs
for Python 2.2 and 2.3.  Three major encodings are
supported:
EUC-JP, Shift_JIS and ISO-2022-JP.

It is in pure Python, of a reasonable size (< 80KB),
and with
an effective means to modify the mapping tables.
msg42402 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2003-01-12 12:33
Logged In: YES 
user_id=38388

Are you aware of the codecs written by Tamito KAJIYAMA ?

   http://www.asahi-net.or.jp/~rd6t-kjym/python/

These are written in C and provide a much improved performance
over Python based ones. They cover the same set of encodings you
have in your packagea dn also include a complete test suite
for the
codecs.
msg42403 - (view) Author: SUZUKI Hisao (suzuki_hisao) Date: 2003-01-16 00:22
Logged In: YES 
user_id=495142

Yes, I know KAJIYAMA's work from version 1.0 to
version 1.4.9.  Indeed I had contributed a patch
to JapaneseCodecs-1.2.  Please read the README
file included in the tar-ball for rationale of
ja-codecs.

As for the efficiency, ja-codecs is fairly fast
and small in practice.  In addition, its mapping
possesses a good mathematical property,
encode(decode(c)) == c for every valid character
c, which is pragmatically useful for many
applications.  (The last version (1.4.9) of
KAJIYAMA's codecs has also remedied it on a
particular character: REVERSE SOLIDUS.  It seems
to lack a validation test like that of
ja-codes-0.6/ja/map_jisx206.py, though.)

As you know, KAJIYAMA's codecs set does not also
cover all the encodings used in Japan today.  For
example, it does not support those of Macintosh.
It might be almost impossible to make a perfect
set of codecs in a realistic size.  It would be
best for "standard library" to prepare a few
"standard" (based on public specifications and in
use over various platforms) encodings, which can
be _easily_ modified by users/developers in order
to be adapted to their specific platforms (in the
spirit of "open source" ;-).

So I think it would be mandatory for Japanese
codecs of standard library to be written in Python
cleanly as well as efficiently enough, or at
least, to effectively allow users to modify
character mappings.
msg42404 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2003-01-16 09:28
Logged In: YES 
user_id=38388

Sorry for not having read the README earlier. 

You do have a point in that it is useful to be able to modify 
encodings in user-specific ways. Of course, this needs to 
be done by creating new codecs and Python files sure
make this process easier.

Now, AFAIK, none of the current Python developers know 
much about Japanese, so we'd need a maintainer for the 
codecs. If you would be able to take over this part, then
I see a good chance of getting the codecs into the Python
core (Tamito's codecs didn't get accepted for the core
distribution because of their size).

Perhaps you could team up with Tamito in this effort ?!
msg42405 - (view) Author: SUZUKI Hisao (suzuki_hisao) Date: 2003-01-18 08:58
Logged In: YES 
user_id=495142

It will be very nice if Japanese codecs are got
into the core.  Nowadays even Perl 5.8 has them.

I am very willing to help you and Tamito in codec
maintenance.  I am sorry, but I am so occupied
with my work that I am afraid it might be
difficult to take time off to do it everyday.
Perhaps I will be able to make responses not
daily but weekly. 
msg42406 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-02-06 03:22
Logged In: YES 
user_id=55188

Python got Japanese codecs by importing CJK codecs.
Thank you for your efforts anyway!
History
Date User Action Args
2022-04-10 16:06:07adminsetgithub: 37761
2003-01-12 03:46:21suzuki_hisaocreate