This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re.escape incorrectly escape literal.
Type: Stage:
Components: Regular Expressions Versions: Python 2.4
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: niemeyer Nosy List: akuchling, blep, niemeyer
Priority: normal Keywords:

Created on 2006-06-03 19:32 by blep, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
re_escape_bug.py blep, 2006-06-03 19:32 Demonstrate re.escape bug.
Messages (3)
msg28703 - (view) Author: Baptiste Lepilleur (blep) Date: 2006-06-03 19:32
Using Python 2.4.2.

Here is a small programm excerpt that reproduce the
issue (attached):
---
import re
literal = r'E:\prg\vc'
print 'Expected:', literal
print 'Actual:', re.sub('a', re.escape(literal), 'a' )
assert re.sub('a', re.escape(literal), 'a' ) == literal
---
And the output of the sample:
---
Expected: E:\prg\vc
Actual  : E\:\prg\vc
Traceback (most recent call last):
  File "re_escape_bug.py", line 5, in ?
    assert re.sub('a', re.escape(literal), 'a' ) == literal
AssertionError
---

Looking at regular expression syntax of python
documentation I don't see why ':' is escaped as '\:'.

Baptiste.
msg28704 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2006-06-03 20:27
Logged In: YES 
user_id=11375

The assertion is wrong, I think.   The signature is re.sub(pattern, replacement, 
string), so the assertion is replacing 'a' with re.escape(literal), which is 
obviously not going to equal literal.

re.escape() puts a backslash in front of all non-alphanumeric characters; ':' is 
non-alphanumeric, so it will be escaped.  The regex parser will ignore 
unknown escapes, so \: is the same as : -- the redundant escaping is 
harmless.
msg28705 - (view) Author: Baptiste Lepilleur (blep) Date: 2006-06-03 21:45
Logged In: YES 
user_id=196852

You are correct. Though, the 'repl' string parameter is not
a literal string and is interpreted. The correct escape
function to preserve the literal is
literal.replace('\\','\\\\') not re.escape(). It would
preserve any interpretation of the repl pattern. I believe
this fact should be clearly stated in the documentation as
it is not that obvious.

The following assertion pass:
---
import re
literal = r'e:\prg\vc\1'
assert re.sub( '(a+)', 
               literal.replace('\\','\\\\'), 
               'aabac' ) == (literal+'b'+literal+'c')
---

In the above example neither \v nor \1 are interpreted.

Regards,
Baptiste.
History
Date User Action Args
2022-04-11 14:56:17adminsetgithub: 43451
2006-06-03 19:32:55blepcreate