This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: iconv_codec NG
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: hyeshik.chang, loewis
Priority: normal Keywords: patch

Created on 2003-04-02 11:10 by hyeshik.chang, last changed 2022-04-10 16:08 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python-iconvcodec-ng.diff.gz hyeshik.chang, 2003-04-02 11:11 patch (rev. 2)
Messages (2)
msg43268 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2003-04-02 11:10
This new implementation of iconv_codec resolves
problems of current
implementations below:

-   Having a reentrant context vulnerable point:
encoder and/or decoder
can be called multiple level in a same time when PEP293
codec error
callback can call another iconv encoder session, too.
So, all encode/
decode session must open their own iconv session but
the current
implementation shares the iconv session in the whole
codec life time.

-   StreamReader can't work correctly: Because iconv
keeps their context
private, StreamReader can't work smart only with
encode/decode function.
Also, handling EINVAL and giving pending characters
from previous data
to error callback is very weak in the current
implementation.

-   Putting a replacement character as just '?' is not
safe for many
encodings: On stateful encodings and non-byte stream
encodings, we need
to encode with iconv even for the replacement character.

-   Can't use encoding names including - and
uppercases: Because codec
subsystem changes - to _ and uppercases to lowercases,
we can't pass
them to the iconv_codec module without loss. For
example, we need the
next aliases to use CJK encodings on Sun iconv:

  # simplified chinese
  "euc_cn": "zh_CN.euc",
  "iso_2022_zh": "zh_CN.iso2022-CN",
  "gbk": "zh_CN.gbk",
  "cp935": "zh_CN-cp935",

  # traditional chinese
  "euc_tw": "zh_TW.euc",
  "iso_2022_tw": "zh_TW.iso2022-7",
  "big5": "zh_TW.big5",
  "cp937": "zh_TW.cp937",

  # japanese
  "iso_2022_jp": "ISO-2022-JP",
  "euc_jp": "eucJP",
  "shift_jis": "PCK",

  # korean
  "euc_kr": "ko_KR.euc",
  "iso_2022_kr": "ISO-2022-KR",
  "johab": "ko_KR.johap",
  "cp932": "ko_KR.cp932",
  "cp949": "ko_KR.cp949",

-   Can't try multiple unicode encodings or methods: On
some iconv
implementations like of HP-UX or Solaris, UCS2 ->
ISO-8859-1 is
available but UCS2 -> euc-kr isn't avaiable and only
UTF-8 -> euc-kr is.

And, many multibyte codecs such as CJK or iconv might
have duplicated
code for processing error callbacks and handling
Streams. So, I splitted
them out to another source. CJK and iconv codecs can
share them just in
source level by putting multibytecodec.c to Modules/
and linking the
file to each of the codecs. Alternatively, if
multibytecodec.c goes
to Python/ and is linked to main python library, the
codecs can be
compiled and loaded by themselves.
multibytecodec.c, the common multibyte codec framework
can be used by
any usual multibyte encodings. By using it, some codec
writer can create
a codec for his/her multibyte encodings without any
care for handling
error callbacks or implementing StreamReader structure.
I wrote CJK codecs using it. and
will submit a patch in an individual patch report.
msg43269 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-04-03 05:07
Logged In: YES 
user_id=21627

Because of complaints about its brokenness, I had to revert
the current iconv codec. It is extremely unlikely that such
code will be added for Python 2.3. So please update this
submission for the current CVS. There is no need to hurry,
though.
History
Date User Action Args
2022-04-10 16:08:01adminsetgithub: 38255
2003-04-02 11:10:16hyeshik.changcreate