This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: East Asian Width support for Unicode
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: hyeshik.chang Nosy List: hyeshik.chang, loewis
Priority: normal Keywords: patch

Created on 2004-05-28 22:59 by hyeshik.chang, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
unicodewidth.diff hyeshik.chang, 2004-05-28 22:59 just a sketch implementation of the idea
unicodewidth2.diff hyeshik.chang, 2004-06-01 13:46 removed unicode database update
unicodewidth3.diff hyeshik.chang, 2004-06-01 14:14 Changed to use WIDE_MASK on flags instead of new member "width".
Messages (5)
msg46093 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-05-28 22:59
As David Goodger's inspiration, I thought that it would
be great if we have some unicode methods that
manipulates East Asian Width
(http://www.unicode.org/reports/tr11/tr11-13.html#UCD).

The attached patch implements rough first-time idea.

>>> u'1'.iswide()
False
>>> u'\uac00'.iswide()
True
>>> u'\ud55c\uae00'.iswide()
True
>>> u'\ud55c\uae00'.width()
4
>>> u'ab\ud55c\uae00'.width()
6
>>> u'ab\ud55c\uae00'.iswide()
False
msg46094 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-05-31 21:07
Logged In: YES 
user_id=21627

Updating to the Unicode 4.0 database is risky. It will break
IDNA, which specifies that IDN must use the 3.2 version of
the unicode database.

It would be ok if you could arrange to provide both versions
of the database. Ideally, the database would only store the
deltas from 4.0 to 3.2, so we don't get any increase in
space for cases where the data didn't change between Unicode
versions.

It might be reasonable to leave that issue alone for this
patch, and proceed with the 3.2 version of EastAsianWidth.
msg46095 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-06-01 13:46
Logged In: YES 
user_id=55188

Okay. In fact I don't care of Unicode 4.0. I'm fine with 3.2.
I uploaded new patch sticking Unicode revision on 3.2.
msg46096 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-06-02 12:29
Logged In: YES 
user_id=21627

The patch is fine, please apply. Make sure you add
appropriate documentation and test cases.

You might consider moving flags at the end of the struct, so
that no padding is added for UCS-4 builds.

msg46097 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-06-02 17:00
Logged In: YES 
user_id=55188

I just checked in. Thanks for the review!

Doc/api/concrete.tex 1.42
Doc/lib/libstdtypes.tex 1.154
Include/unicodeobject.h 2.43
Lib/test/test_unicode.py 1.88
Misc/NEWS 1.983 1.984
Modules/unicodedata_db.h 1.10
Modules/unicodename_db.h 1.7
Objects/unicodectype.c 2.15
Objects/unicodeobject.c 2.212
Objects/unicodetype_db.h 1.8
Tools/unicode/makeunicodedata.py 1.18
History
Date User Action Args
2022-04-11 14:56:04adminsetgithub: 40304
2004-05-28 22:59:21hyeshik.changcreate