Issue 626485: Support Unicode normalization

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/37352

classification

Title:	Support Unicode normalization
Type:		Stage:
Components:	Interpreter Core	Versions:

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:	loewis	Nosy List:	lemburg, loewis
Priority:	normal	Keywords:	patch

Created on 2002-10-21 19:02 by loewis, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
normal.txt	loewis, 2002-10-22 06:09
normal.txt	loewis, 2002-11-23 15:19

Messages (7)
msg41413 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2002-10-21 19:02
This patch adds support for the normalization forms NFC, NFKC, NFD, NFKD. It passes the NormalizationTest-3.2.0.txt tests.
msg41414 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2002-10-23 10:27
Logged In: YES user_id=38388 The patch looks Ok except for a few nits: * I'd rather like a single API normalize(form) which takes the form as string argument instead of NFKD, etc. * __getrecord should be renamed to _getrecord_ex; perhaps both should use a different name altogether, e.g. getunicoderecord * I think you have to add some #ifdef Py_UNICODE_WIDE in the code to avoid compiler warnings for narrow builds about non-const if expressions being always true due to size limits. * The filenames you are using should not include the '-Latest' suffix. If you download the files from unicode.org via FTP they don't have this extension. * The skip test message should include a reference of where to get the test file from, ie. ftp://ftp.unicode.org/Public/UNIDATA/NormalizationTest.txt Thanks for working on this !
msg41415 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2002-10-23 10:36
Logged In: YES user_id=38388 One more minor nit: the indentation in the C file is 4 chars, please reindent your code accordingly
msg41416 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2002-10-25 15:03
Logged In: YES user_id=21627 This patches addresses your issues in the following way: - single API: done. - add _getrecord_ex: done. Rename to getunicoderecord: since this is a static function in unicodedata.c, this renaming would not add that much information, so not done. - #ifdef Py_UNICODE_WIDE. I could not spot any place where this is necessary. - Drop -Latest: done. - adjust skip message: done. - reformat to 4 spaces: not done, I think PEP 7 should be followed.
msg41417 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2002-11-23 15:19
Logged In: YES user_id=21627 This version changes the indentation to 4 spaces. Are any further changes needed?
msg41418 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2002-11-23 21:50
Logged In: YES user_id=38388 Looks good (I don't have time to review the patch in detail, though). Please check it in. Thanks.
msg41419 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2002-11-23 22:08
Logged In: YES user_id=21627 Thanks! Committed as libunicodedata.tex 1.4 test_normalization.py 1.1 NEWS 1.541 unicodedata.c 2.24 unicodedata_db.h 1.7 makeunicodedata.py 1.15

History
Date	User	Action	Args
2022-04-10 16:05:46	admin	set	github: 37352
2002-10-21 19:02:28	loewis	create