Issue 713820: iconv_codec NG

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/38255

classification

Title:	iconv_codec NG
Type:		Stage:
Components:	Library (Lib)	Versions:	Python 2.3

process

Status:	closed	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	hyeshik.chang, loewis
Priority:	normal	Keywords:	patch

Created on 2003-04-02 11:10 by hyeshik.chang, last changed 2022-04-10 16:08 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
python-iconvcodec-ng.diff.gz	hyeshik.chang, 2003-04-02 11:11	patch (rev. 2)

Messages (2)
msg43268 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2003-04-02 11:10
This new implementation of iconv_codec resolves problems of current implementations below: - Having a reentrant context vulnerable point: encoder and/or decoder can be called multiple level in a same time when PEP293 codec error callback can call another iconv encoder session, too. So, all encode/ decode session must open their own iconv session but the current implementation shares the iconv session in the whole codec life time. - StreamReader can't work correctly: Because iconv keeps their context private, StreamReader can't work smart only with encode/decode function. Also, handling EINVAL and giving pending characters from previous data to error callback is very weak in the current implementation. - Putting a replacement character as just '?' is not safe for many encodings: On stateful encodings and non-byte stream encodings, we need to encode with iconv even for the replacement character. - Can't use encoding names including - and uppercases: Because codec subsystem changes - to _ and uppercases to lowercases, we can't pass them to the iconv_codec module without loss. For example, we need the next aliases to use CJK encodings on Sun iconv: # simplified chinese "euc_cn": "zh_CN.euc", "iso_2022_zh": "zh_CN.iso2022-CN", "gbk": "zh_CN.gbk", "cp935": "zh_CN-cp935", # traditional chinese "euc_tw": "zh_TW.euc", "iso_2022_tw": "zh_TW.iso2022-7", "big5": "zh_TW.big5", "cp937": "zh_TW.cp937", # japanese "iso_2022_jp": "ISO-2022-JP", "euc_jp": "eucJP", "shift_jis": "PCK", # korean "euc_kr": "ko_KR.euc", "iso_2022_kr": "ISO-2022-KR", "johab": "ko_KR.johap", "cp932": "ko_KR.cp932", "cp949": "ko_KR.cp949", - Can't try multiple unicode encodings or methods: On some iconv implementations like of HP-UX or Solaris, UCS2 -> ISO-8859-1 is available but UCS2 -> euc-kr isn't avaiable and only UTF-8 -> euc-kr is. And, many multibyte codecs such as CJK or iconv might have duplicated code for processing error callbacks and handling Streams. So, I splitted them out to another source. CJK and iconv codecs can share them just in source level by putting multibytecodec.c to Modules/ and linking the file to each of the codecs. Alternatively, if multibytecodec.c goes to Python/ and is linked to main python library, the codecs can be compiled and loaded by themselves. multibytecodec.c, the common multibyte codec framework can be used by any usual multibyte encodings. By using it, some codec writer can create a codec for his/her multibyte encodings without any care for handling error callbacks or implementing StreamReader structure. I wrote CJK codecs using it. and will submit a patch in an individual patch report.
msg43269 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2003-04-03 05:07
Logged In: YES user_id=21627 Because of complaints about its brokenness, I had to revert the current iconv codec. It is extremely unlikely that such code will be added for Python 2.3. So please update this submission for the current CVS. There is no need to hurry, though.

History
Date	User	Action	Args
2022-04-10 16:08:01	admin	set	github: 38255
2003-04-02 11:10:16	hyeshik.chang	create