Issue 635595: Misleading description of \w in regexs

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/37441

classification

Title:	Misleading description of \w in regexs
Type:		Stage:
Components:	Documentation	Versions:	Python 2.2

process

Created on 2002-11-08 17:29 by glchapman, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Messages (2)
msg13147 - (view)	Author: Greg Chapman (glchapman)	Date: 2002-11-08 17:29
In the Regular Expression Syntax doc page (http://www.python.org/dev/doc/devel/lib/re-syntax.html), the description for \w is misleading (the same goes for \W). The description indicates that, with the locale flag in effect, \w includes "characters defined as letters" for the current locale. In reading that, I took "letters" to mean characters for which isalpha returns true, but, in fact, all characters defined as alphanumerics for the current locale are included (so \w works pretty much the same way with locale flag as with the unicode flag). For example (using '\xb2', the superscript two): Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> locale.setlocale(locale.LC_ALL, '') 'English_United States.1252' >>> import re >>> re.match(r'\w', '\xb2', re.L).group() '\xb2'
msg13148 - (view)	Author: Fred Drake (fdrake)	Date: 2002-11-12 23:14
Logged In: YES user_id=3066 Fixed in Doc/lib/libre.tex revisions 1.91 and 1.73.6.11.