This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re '\' char interpretation problem
Type: Stage:
Components: Regular Expressions Versions: Python 2.4
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: niemeyer Nosy List: niemeyer, ooldham
Priority: normal Keywords:

Created on 2006-07-06 21:26 by ooldham, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
reProblem.py ooldham, 2006-07-06 21:26 short code snippet showing both problems with re and '\' char
Messages (4)
msg29083 - (view) Author: ollie oldham (ooldham) Date: 2006-07-06 21:26
I've run across 2 problems having to do with '\' 
character problems with the re module.

Problem 1 does not match the re when it should have.
Problem 2 matches, when it should not have.

There is a short snippet of code attached that shows 
the problems I'm having, and the output as it occurs 
on my machine.

I'm running on Windows 2000
Python versions: 2.4b1 and 2.4.3c1 both act the same 
way.

Problem (1) : why does * work and not + ?
import re
rex = re.compile(r'[a-z]:\.*', re.IGNORECASE)
rey = re.compile(r'[a-z]:\.+', re.IGNORECASE)
path1 = r'D:\Logs'
print rex.match(path1) # Matches - as it should have.
print rey.match(path1) # FAILES to match - should have.

Problem 2) : match occurs on nonUncPath when it should 
not
import re
uncPath = r'\\someUNC\path'
nonUncPath = r'\nonUnc\path'
rew = re.compile('\\\\.+', re.IGNORECASE)
print rew.match(uncPath) # works as it should.
print rew.match(nonUncPath) # matches and it should 
NOT.
msg29084 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2006-07-06 21:36
Logged In: YES 
user_id=7887

1) r'[a-z]:\.+' should not match r'D:\Logs'. r'\.+' matches
one or more dots. There's no dot in this string.

2) '\\\\.+' is the equivalent of r'\\.+', and should match
anything that starts with a '\' and has at least one char
following it, which includes r'\nonUnc\path'.
msg29085 - (view) Author: ollie oldham (ooldham) Date: 2006-07-06 22:46
Logged In: YES 
user_id=649833

I beg to differ on problem 1)

Since ‘r’ was used in the definition of both the re and 
path, the ‘.’ Char is not being escaped (not supposed to be 
anyway).
And even if it is, then rex=re.compile(‘[a-z]:\\.+’, 
re.IGNORECASE) should get me what I want (in textual form:: 
char a-z colon backslash with 1 or more trailing chars).
But that does not work either.

I beg to differ on item 2) as well:
Yes - '\\\\.+' is the equivalent of r'\\.+'
BUT I then read this as: 2 backslashes with 1 or more 
chars – NOT backslash with escaped ‘.’
msg29086 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2006-07-06 22:55
Logged In: YES 
user_id=7887

Please, use a single way to report issues. Do not message
*and* add a comment to the bug.

I think you're missing the behavior of r'' in Python. It
changes the way the Python interpreter parses the string,
not the way the regular expression compiler/interpreter
works. r'\.' is precisely the same as '\\.', and both of
them really describe the string |\.|.

  >>> r'\.' == '\\.'
  True

  >>> print r'\.'
  \.

Escaping a dot means a real dot. Please have a look at the
re module documentation and perhaps some general regular
expression info for more details.
History
Date User Action Args
2022-04-11 14:56:18adminsetgithub: 43628
2006-07-06 21:26:22ooldhamcreate