This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Numeric characters not recognized.
Type: Stage:
Components: Interpreter Core Versions: Python 2.5
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: andersch, lemburg, loewis
Priority: normal Keywords: patch

Created on 2006-05-24 19:54 by andersch, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
numeric.patch andersch, 2006-05-24 19:54 Add missing numeric characters
Messages (7)
msg50353 - (view) Author: Anders Chrigström (andersch) Date: 2006-05-24 19:54
unicode.isnumeric() and unicodedata.numeric() fails to
recognize a bunch of numeric unicode characters.

The patch fixes this.
msg50354 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-05-26 18:07
Logged In: YES 
user_id=21627

Which version of the Unicode database is this based on?
msg50355 - (view) Author: Anders Chrigström (andersch) Date: 2006-05-26 19:20
Logged In: YES 
user_id=621306

The patch makes it match version 4.1.0. Though it didn't match
version 3.2.0 either.

msg50356 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2006-05-26 20:42
Logged In: YES 
user_id=38388

Rather than creating a patch for every new version, how
about extracting the relevant data from the Unicode database
using a script and putting that into Tools/unicode/ ?!

Note that the original version was also generated from the
database. Unfortunately, I can't find that script anymore.

One nit with the patch: it should put non-BMP Unicode code
points into #ifdef Py_UNICODE_WIDE ... #endif clauses.
msg50357 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-05-26 21:43
Logged In: YES 
user_id=21627

I agree it should be possible to regenerate that easily (or
perhaps entirely merge it into unicodedata/unicodectype).

andersch, how did you create the patch?
msg50358 - (view) Author: Anders Chrigström (andersch) Date: 2006-05-27 06:56
Logged In: YES 
user_id=621306

I got the differenced through comparing with _numeric in
http://codespeak.net/svn/pypy/dist/pypy/module/unicodedata/unicodedb.py
which we have generated with
ssh://codespeak.net/svn/pypy/dist/pypy/module/unicodedata/generate_unicodedb.py

If You are looking into generating this from the Unicode
database You might want to fix _PyUnicode_IsLinebreak too.
msg50359 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-05-27 08:39
Logged In: YES 
user_id=21627

Thanks for the patch, committed as 46432. I conditionalized
the non-BMP characters on Py_UNICODE_WIDE, and updated
PyUnicode_IsNumeric to recognize U+0F33 as a numeric character.

If anybody wants to contribute a generator for these
functions (or perhaps generate a table in the first place),
please go ahead.
History
Date User Action Args
2022-04-11 14:56:17adminsetgithub: 43405
2006-05-24 19:54:24anderschcreate