This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Incorrect result for regular expression - "|(hello)|(world)"
Type: Stage:
Components: Regular Expressions Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: niemeyer Nosy List: georg.brandl, karamana, niemeyer, rhettinger, tim.peters
Priority: normal Keywords:

Created on 2005-06-01 05:13 by karamana, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)
msg25458 - (view) Author: Vijay Kumar (karamana) Date: 2005-06-01 05:13
The regular expression "|hello|world" incorrectly gives a 
match, owing to the starting '|'.  Below is a sample 
program which highlights this.  The correct result 
behavior is to return None:

If the leading '|' is removed then the result is correct.

-----
import re
m = re.search("|hello|world","This is a simple sentence")
print m

m2 = re.search("hello|world","This is a simple sentence")
print m2

---- output ---
<_sre.SRE_Match object at 0x00B71F70>
None
----------
The first one is incorrect.  Should have returned a None.
msg25459 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2005-06-01 05:19
Logged In: YES 
user_id=31435

I expect you'll find that, e.g., Perl does the same thing:  
a "missing" alternative is treated as an empty string, and an 
empty string always matches.  What basis do you have for 
claiming it should not match (beyond just repeating that it 
should not <wink>)?
msg25460 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2005-06-01 21:39
Logged In: YES 
user_id=80475

The current behavior best matches my expectations.
One other datapoint, AWK handles it the same way.

Recommend closing this as Invalid.
msg25461 - (view) Author: Vijay Kumar (karamana) Date: 2005-06-01 22:59
Logged In: YES 
user_id=404715

I think what you are saying is correct in terms of a formal 
sense, but it makes sense to distinguish between a useful 
match and an empty match.  May be there can be an 
additional method isEmptyMatch() in the match object which 
can be used to detect this.

Also this one does not work: Gives a compile error
m = re.search("[]","This is a simple sentence")
print m

wherease this one returns None:
m = re.search("[|]","This is a simple sentence")
print m

So the empty match is not consistent :)  (don't know if I 
should wink )
msg25462 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005-06-02 06:49
Logged In: YES 
user_id=1188172

Your example is wrong. "[]" is an error because it is an
empty character group.
"[|]" is a valid character group which matches the literal
"|", it is equivalent to r"\|". Between [ and ] most
character lose their special meaning.

I'm also in favour of the current behaviour, recommend closing.
msg25463 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2005-06-03 14:53
Logged In: YES 
user_id=31435

I'm closing this as not-a-bug.  The current behavior makes 
sense, matches how other regexp packages work, and can't 
be changed regardless without breaking existing code.  Note 
that

    (mid|)night

isn't the same as

    (mid)?night

in the case where "mid" doesn't match.  That's one reason 
the first form actually gets used (in the first form group 1 
matches an empty string, in the second form group 1 doesn't 
match at all).

As birkenfeld said, character classes are entirely different 
gimmicks.
History
Date User Action Args
2022-04-11 14:56:11adminsetgithub: 42039
2005-06-01 05:13:29karamanacreate