In the Regular Expression Syntax doc page
(http://www.python.org/dev/doc/devel/lib/re-syntax.html), the
description for \w is misleading (the same goes for \W).
The description indicates that, with the locale flag in effect,
\w includes "characters defined as letters" for the current
locale. In reading that, I took "letters" to mean characters
for which isalpha returns true, but, in fact, all characters
defined as alphanumerics for the current locale are
included (so \w works pretty much the same way with locale
flag as with the unicode flag). For example (using '\xb2',
the superscript two):
Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more
information.
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'English_United States.1252'
>>> import re
>>> re.match(r'\w', '\xb2', re.L).group()
'\xb2'
|