This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Misleading description of \w in regexs
Type: Stage:
Components: Documentation Versions: Python 2.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: fdrake Nosy List: fdrake, glchapman
Priority: normal Keywords:

Created on 2002-11-08 17:29 by glchapman, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Messages (2)
msg13147 - (view) Author: Greg Chapman (glchapman) Date: 2002-11-08 17:29
In the Regular Expression Syntax doc page 
(http://www.python.org/dev/doc/devel/lib/re-syntax.html), the 
description for \w is misleading (the same goes for \W).  
The description indicates that, with the locale flag in effect, 
\w includes "characters defined as letters" for the current 
locale.  In reading that, I took "letters" to mean characters 
for which isalpha returns true, but, in fact, all characters 
defined as alphanumerics for the current locale are 
included (so \w works pretty much the same way with locale 
flag as with the unicode flag).  For example (using '\xb2', 
the superscript two):

Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit 
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more 
information.
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'English_United States.1252'
>>> import re
>>> re.match(r'\w', '\xb2', re.L).group()
'\xb2'
msg13148 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2002-11-12 23:14
Logged In: YES 
user_id=3066

Fixed in Doc/lib/libre.tex revisions 1.91 and 1.73.6.11.
History
Date User Action Args
2022-04-10 16:05:51adminsetgithub: 37441
2002-11-08 17:29:23glchapmancreate