This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: unicode alphanumeric regexp bug
Type: Stage:
Components: Regular Expressions Versions: Python 2.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: efge, glchapman, gvanrossum
Priority: normal Keywords:

Created on 2002-09-17 01:18 by efge, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Messages (3)
msg12406 - (view) Author: Florent Guillaume (efge) Date: 2002-09-17 01:18
I've got the following problem, in python 2.1, 2.2 and
2.3a0 (Debian):

>>> import re
>>> re.compile(r'\w+', re.U).sub('X', u'hello caf\xe9')
u'X X'
>>> re.compile(r'\w{1}', re.U).sub('X', u'hello caf\xe9')
u'XXXXX XXXX'
>>> re.compile(r'\w', re.U).sub('X', u'hello caf\xe9')
u'XXXXX XXX\xe9'

The first two results are ok, but the third is not.
msg12407 - (view) Author: Greg Chapman (glchapman) Date: 2002-11-04 16:51
Logged In: YES 
user_id=86307

I just posted a small patch to sre_compile.py which should fix this:

http://sourceforge.net/tracker/?
func=detail&aid=633359&group_id=5470&atid=305470
msg12408 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2003-02-24 01:29
Logged In: YES 
user_id=6380

Fixed in 2.3 CVS using Greg's patch. Will backport to 2.2 as
well.
History
Date User Action Args
2022-04-10 16:05:40adminsetgithub: 37183
2002-09-17 01:18:04efgecreate