This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Multibyte string on string::string_print
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: gvanrossum, hyeshik.chang, loewis
Priority: normal Keywords: patch

Created on 2001-11-09 07:10 by hyeshik.chang, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python_mbstring_diff.txt hyeshik.chang, 2001-11-09 07:10 patch to Objects/stringobject.c
configure.in.diff.txt hyeshik.chang, 2001-12-10 03:20 2nd) autoconf detect for mbtowc(), iswprint()
pyconfig.h.in.diff.txt hyeshik.chang, 2001-12-10 03:21 2nd) autoconf detect for mbtowc(), iswprint()
stringobject.c.diff.txt hyeshik.chang, 2001-12-10 03:22 2nd) new clean(on my view) patch for Objects/stringobject.c
mb3.diff hyeshik.chang, 2002-04-01 18:06 3rd) revised (includes patch for stringobject.c, configure.in and pyconfig.h.in)
Messages (10)
msg38131 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2001-11-09 07:10
Many multibyte language users are difficult to see 
native characters on list or dictionary and etc.
This patch allows printing multibyte on UNIX98-
compatible machines; mbtowc() is ISO/IEC 9899:1990 
standard C-API function.
msg38132 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2001-11-09 21:21
Logged In: YES 
user_id=21627

Even though I think this patch is correct in principle, I
see a few problems with it:
1. Since it doesn't fix a bug, it probably cannot go into 2.2.
2. There is no autoconf test for mbtowc. You should test
this in configure, and then conditionalize your code on
HAVE_MBTOWC.
3. There is too much code duplication. Try to find a
solution which special-cases the escape codes (\something)
only once. For example, you may implement a trivial mbtowc
redefinition if mbtowc is not available, and then use mbtowc
always.
msg38133 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-12-04 19:08
Logged In: YES 
user_id=6380

I don't understand the point of using mbtowc() here.

The code extracts a wide character, but then it uses
isprint() on it, and as far as I know, isprint() is not
defined on wide characters, only on 'unsigned char' (and on
-1).

Isn't what the author wants simply to is isprint(c) instead
of (c < ' ' || c >= 0x7f)???
msg38134 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2001-12-06 15:12
Logged In: YES 
user_id=21627

You are right, the code should use iswprint instead.

The point is that multiple subsequent bytes can make up a
single printable character. Not every character above 127 is
necessarily printable (e.g. in Latin-1, only characters
above 160 are printable). Likewise, a single byte may not be
printable, but a combination will print fine. So this code
is supposed to catch only those cases where printing will
actually work.
msg38135 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2001-12-07 06:38
Logged In: YES 
user_id=55188

Yes, it should be changed to iswprint on Linux systems. 
(but, isprint of BSD systems was designed for wide 
characters)
As loewis told, EUC codes of Korea, Japan, Taiwan doesn't 
use 0x7F-0x9F for printable character. So, I think that 
using mbtowc is unavoidable.
msg38136 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-12-07 13:21
Logged In: YES 
user_id=6380

Still, the patch as it exists is unacceptable -- it needs
configure support to decide whether to use mbtowc() and
whether to use iswprint() or isprint() (I would hope on BSD
there is also an iswprint(), to be standard-conforming).
msg38137 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2001-12-10 03:26
Logged In: YES 
user_id=55188

I uploaded 2nd patches which contains configure support.
Unfortunately, Citrus(new generation locale support for 
*BSDs) didn't implemented iswprint() yet. but *BSDs 
supports wide character via Rune Locale isprint() func.
msg38138 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2001-12-10 03:38
Logged In: YES 
user_id=55188

Oops, one mistake. sorry.

stringobject.c:646

else if (_ISPRINT(c)) {
-> 
else if (cr > 0 && _ISPRINT(c)) {

(to detect whether mbtowc failed to convert)
msg38139 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-10-07 13:58
Logged In: YES 
user_id=21627

Thanks for the patch, committed as

configure 1.343;
configure.in 1.354;
pyconfig.h.in 1.51;
stringobject.c 2.190;

I'm not quite sure that your correction is correct: If we
invoke iswprint, cr is already guaranteed to be >0, since we
otherwise goto nonprintable.
msg38140 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-10-11 05:38
Logged In: YES 
user_id=21627

The patch was causing too many problems, so I had to back it
out.
History
Date User Action Args
2022-04-10 16:04:37adminsetgithub: 35494
2001-11-09 07:10:11hyeshik.changcreate