Issue 1518406: re '\' char interpretation problem

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/43628

classification

Title:	re '\' char interpretation problem
Type:		Stage:
Components:	Regular Expressions	Versions:	Python 2.4

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:	niemeyer	Nosy List:	niemeyer, ooldham
Priority:	normal	Keywords:

Created on 2006-07-06 21:26 by ooldham, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
reProblem.py	ooldham, 2006-07-06 21:26	short code snippet showing both problems with re and '\' char

Messages (4)
msg29083 - (view)	Author: ollie oldham (ooldham)	Date: 2006-07-06 21:26
I've run across 2 problems having to do with '\' character problems with the re module. Problem 1 does not match the re when it should have. Problem 2 matches, when it should not have. There is a short snippet of code attached that shows the problems I'm having, and the output as it occurs on my machine. I'm running on Windows 2000 Python versions: 2.4b1 and 2.4.3c1 both act the same way. Problem (1) : why does * work and not + ? import re rex = re.compile(r'[a-z]:\.*', re.IGNORECASE) rey = re.compile(r'[a-z]:\.+', re.IGNORECASE) path1 = r'D:\Logs' print rex.match(path1) # Matches - as it should have. print rey.match(path1) # FAILES to match - should have. Problem 2) : match occurs on nonUncPath when it should not import re uncPath = r'\\someUNC\path' nonUncPath = r'\nonUnc\path' rew = re.compile('\\\\.+', re.IGNORECASE) print rew.match(uncPath) # works as it should. print rew.match(nonUncPath) # matches and it should NOT.
msg29084 - (view)	Author: Gustavo Niemeyer (niemeyer) *	Date: 2006-07-06 21:36
Logged In: YES user_id=7887 1) r'[a-z]:\.+' should not match r'D:\Logs'. r'\.+' matches one or more dots. There's no dot in this string. 2) '\\\\.+' is the equivalent of r'\\.+', and should match anything that starts with a '\' and has at least one char following it, which includes r'\nonUnc\path'.
msg29085 - (view)	Author: ollie oldham (ooldham)	Date: 2006-07-06 22:46
Logged In: YES user_id=649833 I beg to differ on problem 1) Since â€˜râ€™ was used in the definition of both the re and path, the â€˜.â€™ Char is not being escaped (not supposed to be anyway). And even if it is, then rex=re.compile(â€˜[a-z]:\\.+â€™, re.IGNORECASE) should get me what I want (in textual form:: char a-z colon backslash with 1 or more trailing chars). But that does not work either. I beg to differ on item 2) as well: Yes - '\\\\.+' is the equivalent of r'\\.+' BUT I then read this as: 2 backslashes with 1 or more chars â€“ NOT backslash with escaped â€˜.â€™
msg29086 - (view)	Author: Gustavo Niemeyer (niemeyer) *	Date: 2006-07-06 22:55
Logged In: YES user_id=7887 Please, use a single way to report issues. Do not message and add a comment to the bug. I think you're missing the behavior of r'' in Python. It changes the way the Python interpreter parses the string, not the way the regular expression compiler/interpreter works. r'\.' is precisely the same as '\\.', and both of them really describe the string \|\.\|. >>> r'\.' == '\\.' True >>> print r'\.' \. Escaping a dot means a real dot. Please have a look at the re module documentation and perhaps some general regular expression info for more details.

History
Date	User	Action	Args
2022-04-11 14:56:18	admin	set	github: 43628
2006-07-06 21:26:22	ooldham	create