Issue 1494554: Numeric characters not recognized.

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/43405

classification

Title:	Numeric characters not recognized.
Type:		Stage:
Components:	Interpreter Core	Versions:	Python 2.5

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:		Nosy List:	andersch, lemburg, loewis
Priority:	normal	Keywords:	patch

Created on 2006-05-24 19:54 by andersch, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
numeric.patch	andersch, 2006-05-24 19:54	Add missing numeric characters

Messages (7)
msg50353 - (view)	Author: Anders Chrigström (andersch)	Date: 2006-05-24 19:54
unicode.isnumeric() and unicodedata.numeric() fails to recognize a bunch of numeric unicode characters. The patch fixes this.
msg50354 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2006-05-26 18:07
Logged In: YES user_id=21627 Which version of the Unicode database is this based on?
msg50355 - (view)	Author: Anders Chrigström (andersch)	Date: 2006-05-26 19:20
Logged In: YES user_id=621306 The patch makes it match version 4.1.0. Though it didn't match version 3.2.0 either.
msg50356 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2006-05-26 20:42
Logged In: YES user_id=38388 Rather than creating a patch for every new version, how about extracting the relevant data from the Unicode database using a script and putting that into Tools/unicode/ ?! Note that the original version was also generated from the database. Unfortunately, I can't find that script anymore. One nit with the patch: it should put non-BMP Unicode code points into #ifdef Py_UNICODE_WIDE ... #endif clauses.
msg50357 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2006-05-26 21:43
Logged In: YES user_id=21627 I agree it should be possible to regenerate that easily (or perhaps entirely merge it into unicodedata/unicodectype). andersch, how did you create the patch?
msg50358 - (view)	Author: Anders Chrigström (andersch)	Date: 2006-05-27 06:56
Logged In: YES user_id=621306 I got the differenced through comparing with _numeric in http://codespeak.net/svn/pypy/dist/pypy/module/unicodedata/unicodedb.py which we have generated with ssh://codespeak.net/svn/pypy/dist/pypy/module/unicodedata/generate_unicodedb.py If You are looking into generating this from the Unicode database You might want to fix _PyUnicode_IsLinebreak too.
msg50359 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2006-05-27 08:39
Logged In: YES user_id=21627 Thanks for the patch, committed as 46432. I conditionalized the non-BMP characters on Py_UNICODE_WIDE, and updated PyUnicode_IsNumeric to recognize U+0F33 as a numeric character. If anybody wants to contribute a generator for these functions (or perhaps generate a table in the first place), please go ahead.

History
Date	User	Action	Args
2022-04-11 14:56:17	admin	set	github: 43405
2006-05-24 19:54:24	andersch	create