Issue690974
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2003-02-22 00:06 by peterno, last changed 2022-04-10 16:07 by admin. This issue is now closed.
Messages (3) | |||
---|---|---|---|
msg14771 - (view) | Author: peter nordlund (peterno) | Date: 2003-02-22 00:06 | |
I submit this problem although I am not sure it is a real bug. It could be that I don't know how this locale stuff works. Anyway, I have been browsing around quite some time on the net to find some good examples of code demonstating how to use regexp in python to get hold of åäö when using \w, but I have not found any complete examples. If the code below behaves correctly, I suggest that the regexp documentation is improved by adding a complete example that shows how to use re.LOCALE. (The code behaves in the same way with python 2.2.2.) #---------------------------------------- import locale locale.setlocale(locale.LC_ALL,'swedish') import re reguml=re.compile(r"[a-zä]", re.LOCALE) # I expect reguml and regw to give the same result. regw=re.compile(r"\w", re.LOCALE) reguml2=re.compile(r"[a-zä]+", re.LOCALE) # I expect reguml2 and regw2 to give the same result. regw2=re.compile(r"[\w]+", re.LOCALE) str="abcä d\344e ä f "; print reguml.findall(str) # Behaves as I expect. print regw.findall(str) # Here I expect same result as above, but I don't get it. print reguml2.findall(str) # Behaves as I expect. print regw2.findall(str) # Behaves as I expect. #---------------------------------------- >>> import locale >>> locale.setlocale(locale.LC_ALL,'swedish') 'swedish' >>> import re >>> reguml=re.compile(r"[a-zä]", re.LOCALE) # I expect reguml and regw to give the same result. >>> regw=re.compile(r"\w", re.LOCALE) >>> reguml2=re.compile(r"[a-zä]+", re.LOCALE) # I expect reguml2 and regw2 to give the same result. >>> regw2=re.compile(r"[\w]+", re.LOCALE) >>> str="abcä d\344e ä f "; >>> >>> print reguml.findall(str) # Behaves as I expect. ['a', 'b', 'c', '\xe4', 'd', '\xe4', 'e', '\xe4', 'f'] >>> print regw.findall(str) # Here I expect same result as above, but I don't get it. ['a', 'b', 'c', 'd', 'e', 'f'] >>> print reguml2.findall(str) # Behaves as I expect. ['abc\xe4', 'd\xe4e', '\xe4', 'f'] >>> print regw2.findall(str) # Behaves as I expect. ['abc\xe4', 'd\xe4e', '\xe4', 'f'] --------------------------------------------------------- peternl:Python-2.3a2>> /work1/pkg/dev-tools/python/2.3a2/bin/python -V Python 2.3a2 peternl:Python-2.3a2>>uname -a Linux peternl.computervision.se 2.4.18-6mdk-petern #2 Thu May 23 06:40:30 CEST 2002 i686 unknown |
|||
msg14772 - (view) | Author: Greg Chapman (glchapman) | Date: 2003-02-22 17:15 | |
Logged In: YES user_id=86307 I believe this is fixed by this patch: http://www.python.org/sf/633359 At any rate, using a patched 2.22, regw behaves identically to reguml. |
|||
msg14773 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2003-04-19 08:14 | |
Logged In: YES user_id=21627 This has been fixed with Greg's patch. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:07:01 | admin | set | github: 38028 |
2003-02-22 00:06:40 | peterno | create |